Introduction to the Cell Multiprocessor

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development Vol. 49, No. 4/5, Pg. 589 (Jul-Sep 2005) Presented by John Ingalls ECE 259 - April 8, 2010 Introduction to the Cell Multiprocessor

ISA: 64-bit IBM Power Architecture with SIMD. • 1 PPE, 8 SPEs, 1 memory and 1 I/O controller all on coherent bus (single address space). • PowerPE: 2-issue in-order 2-thread-SMT, 32KB L1 I$/D$, 512KB L2$ with software management hooks, 128-bit total SIMD width, separate Vector/SIMD issue queue from scalar execute. Design Summary: PPE

SynergisticPE: in-order SIMD. 128-bit total width, like PPE. • Local Store (LS): 256KB, single port for either 128-bit SIMD-word access, or 128-byte insns fetch or DMA I/O. • 128-entry regfile for static (compiler) insn reordering • area efficient: 15% control, rest is Execute & Local Store Design Summary: SPE

I/O supports direct connection to another Cell to easily build a cache-coherent multiprocessor. • Native binary compatibility with Power-ISA apps. • Modular design, but still fully custom. • Extensive test and monitoring circuitry. Other Features

Challenges: • SPE Local Store is software managed. • Each SPE supports one thread context, and context switches are expensive. • Models: • Function Offload: function call from PPE • Device Extension: SPE isolated, like a device • Compute Acceleration: PPE aggregates SPE results • Streaming: each SPE is a step in software pipeline • Shared Memory Multiprocessor: conventional • Asymmetric Thread Runtime: p-threads Programming

Good Bad • Paper is easy to follow and doesn’t throw too much complicated stuff at reader. • Built and shipped on time by a joint venture of IBM, Sony, and Toshiba. • Many applications in media and supercomputing. • They keep listing static limitations imposed by their models as advantages, such as explicitly managed caches. • No hard performance data or comparison to competition. Only “anecdotal evidence” shows that it is possible to fully utilize Cell.

Keywords: • Heterogeneous multi-core SIMD processor. • Single address space across all cores on chip • 1x conventional PPE for control. • 8x SPEs for streaming SIMD are very fast and power efficient if used. • Several programming models are feasible. • Questions: • How could the programming models be easier? • What direction should this architecture grow in? Conclusion / Questions

Introduction to the Cell Multiprocessor

Introduction to the Cell Multiprocessor

Presentation Transcript

Introduction to the Cell Cycle

Programming the Cell Multiprocessor

Introduction to Cell Communication

Introduction to Cell Biology

Introduction to the Cell Microscope

Introduction to the Cell

Introduction to the Cell

Introduction to Cell Biology

Multiprocessor Introduction and Vector computers

Lecture 12 –Multiprocessor Introduction

UNIT I: INTRODUCTION TO THE CELL

Introduction to the Eukaryotic Cell

INTRODUCTION TO THE CELL MEMBRANE

4-1 Introduction to the Cell

Introduction to Cell Physiology

Introduction to Multiprocessor System-on-Chip

INTRODUCTION TO THE CELL

Lecture 12 –Multiprocessor Introduction

Introduction to Multiprocessor System-on-Chip

NOTES : 7.1 - Introduction to the Cell

Introduction to cell biology