PUMICE: Processing-using-Memory Integration with a Scalar Pipeline for Symbiotic Execution

Abstract

Existing SIMD extensions in scalar CPUs (e.g., SSE, AVX, etc.) can leverage instruction-level parallelism (ILP) because of their tight integration with the CPU pipeline. However, the vectors they employ are quite short, and this limits their ability to exploit data-level parallelism (DLP). On the other hand, processing-using-memory (PUM) accelerators are capable of exploiting massive amounts of DLP, as they typically perform computation on very long vectors (tens of thousands of elements) within the memory itself. Recent work demonstrates that order-of-magnitude speedups can be achieved by these architectures for a variety of workloads over area-equivalent multicore CPUs with SIMD extensions. Still, PUM architectures are largely decoupled from the CPU itself, thereby limiting their ability to tap the CPU’s ILP the way SIMD extensions do.In this paper, we propose PUMICE, a tightly integrated CPU-PUM architecture that simultaneously exploits DLP and ILP for very long vector operations. As a result of this tight integration, PUMICE delivers significant performance gains: Our experimental results show speedups of up to 2.2× (1.4× on average) over a state-of-the-art decoupled approach.

Type
Publication
Processing-using-Memory Integration with a Scalar Pipeline for Symbiotic Execution
Cecilio C. Tamarit
Cecilio C. Tamarit
ECE PhD Candidate
Fulbrighter

Cecilio C. Tamarit is a Fulbright Scholar from Spain, and an Electrical and Computer Engineering PhD Candidate at Cornell University. He earned a Master’s and Bachelor’s Degree at Universidad Politécnica de Valencia. Computers and cultural exchange are some of his many passions, and he enjoys sharing them with others. In the past, he has worked in several outreach and entrepreneurial tech projects.