PUMICE: Processing-using-Memory Integration with a Scalar Pipeline for Symbiotic Execution

Resumen

Existing SIMD extensions in scalar CPUs (e.g., SSE, AVX, etc.) can leverage instruction-level parallelism (ILP) because of their tight integration with the CPU pipeline. However, the vectors they employ are quite short, and this limits their ability to exploit data-level parallelism (DLP). On the other hand, processing-using-memory (PUM) accelerators are capable of exploiting massive amounts of DLP, as they typically perform computation on very long vectors (tens of thousands of elements) within the memory itself. Recent work demonstrates that order-of-magnitude speedups can be achieved by these architectures for a variety of workloads over area-equivalent multicore CPUs with SIMD extensions. Still, PUM architectures are largely decoupled from the CPU itself, thereby limiting their ability to tap the CPU’s ILP the way SIMD extensions do.In this paper, we propose PUMICE, a tightly integrated CPU-PUM architecture that simultaneously exploits DLP and ILP for very long vector operations. As a result of this tight integration, PUMICE delivers significant performance gains: Our experimental results show speedups of up to 2.2× (1.4× on average) over a state-of-the-art decoupled approach.

Tipo
Publicación
Processing-using-Memory Integration with a Scalar Pipeline for Symbiotic Execution
Cecilio C. Tamarit
Cecilio C. Tamarit
Candidato a doctor
Fulbrighter

Cecilio C. Tamarit es becario Fulbright y candidato a doctor en Ingeniería de Computadores en Cornell University. Estudió el máster y el grado en la Universidad Politécnica de Valencia, donde también tuvo la oportunidad de trabajar como investigador en el Grupo de Arquitecturas Paralelas (GAP). La informática y el intercambio cultural son dos de sus numerosas pasiones, y ambas disfruta compartirlas con otros. En el pasado ha trabajado en múltiples proyectos de divulgación y emprendimiento de carácter tanto tecnológico como social.