80 likes | 157 Views
Memory is the Network. Krste Asanovic The Parallel Computing Laboratory EECS Department UC Berkeley. NoCS Panel, May 13, 2009. One Slide Version of Talk. No-one can afford to build application-specific chips at least not in feature sizes that warrant a NoC
E N D
Memory is the Network Krste Asanovic The Parallel Computing Laboratory EECS Department UC Berkeley NoCS Panel, May 13, 2009
One Slide Version of Talk No-one can afford to build application-specific chips at least not in feature sizes that warrant a NoC All future chips are programmable and parallel Multicore, Manycore, GPU, FPGA Usable programmable parallel systems bottlenecked by memory system Performance and energy-efficiency Network is just a path to memory (on-chip and off-chip) - work on entire problem not just cabling Change name to “International Symposium on Memory Systems” (Large-scale networks between chips/boards/racks still very interesting to think of as networks, but that’s different.)
Application-Specific Application Chip Programmable Application Chip
Successful Parallel Programming Models Actor Networks Data-Parallel/SPMD Shared-Memory Dynamic Threads/Transactions barrier fork barrier fork join barrier join • Producer-consumer easy • Mutual exclusion easy • No implicitly shared state • Sharing state cumbersome • Irregular computation hard • Examples: Occam, Simulink, StreamIt, Clik, … • Producer-consumer easy • Handled en-masse • Mutual exclusion easy • Sharing state easy • Irregular computation hard • Examples: APL, NESL, Matlab, HPF, OpenMP, UPC, CAF, … • Producer-consumer hard • Mutual exclusion hard • Transactional mem. helps • Sharing state easy • Maybe too easy • Examples: Pthreads, Cilk, Java, …
Memory is the Network-on-Chip from Software’s View Actors - messages buffered in memory-resident channels until convenient to run actor Data-Parallel - memory holds arrays used to interchange data between parallel phases Transactional - memory holds shared data base accessed atomically Programming with data on-flight on wires is too brittle for any large code (sorry Anant), need flexibility in when and where code gets executed
Fixed-function accelerators Any programmable chip will have a stack of fixed-function accelerators Crypto, Codecs, Radios, Graphics But these won’t use NoC internally, just place and route They’ll connect to general-purpose portion through memory for all reasons given before
Research Directions Make memory a better communication channel Richer software interface Better synchronization primitives E.g., atomic message enqueue/dequeue for actor channels Atomic fetch-and-op for data-parallel apps Transactional memory for concurrent apps Better cache-coherence protocols Make memory go faster and with lower power New device technologies (e.g., photonics) New microarchitectures and network ideas Must consider on-chip and off-chip to main memory at same time
P P P P P P P P P P P P P P P P $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ mem mem mem mem mem mem mem mem mem mem mem mem mem mem mem mem M M M M M M M M M M M M M M M M IO IO Cray X1