170 likes | 443 Views
Intel Single Chip Cloud Computer (SCC) – An Overview. by Karthik.V.M. Motivations for SCC [1]. Many-core processor research High-performance power-efficient fabric Fine-grain power management Message-based programming support Parallel Programming Research
E N D
Intel Single Chip Cloud Computer (SCC) – An Overview byKarthik.V.M.
Motivations for SCC[1] • Many-core processor research • High-performance power-efficient fabric • Fine-grain power management • Message-based programming support • Parallel Programming Research • Better support for scale-out model servers • OS, Communication architecture • Scale out programming model for client • Programming languages, runtimes Courtesy: intel
SCC Feature Set • First Si with 48 iA cores on a single die • Power envelope 125W, Core @ 1GHz, Mesh @ 2GHz • Message passing architecture • No coherent shared memory • Proof of concept for scalable many-core solution • Next generation 2D mesh interconnect • Bisection B/W 1.5Tb/s to 2Tb/s, avg.power 6W to 12 W • Fine grain dynamic power management Courtesy: intel
SCC system overview Courtesy: intel
Die Architecture Courtesy: intel
Voltage and Frequency islands Courtesy: intel
Package and Test Board Courtesy: intel
Core & Router Fmax Courtesy: intel
SCC Platform Board Overview Courtesy: intel
SCC Software • SCC customized linux • Cross compilers for pentium processor available for c++ & fortran • Cross compiled MPI2 including iTAC trace analyzer available • C++ programming frame work ”baremetal C” availble for creating baremeta apps, OS etc • Management Console PC software • sccGui Courtesy: intel
Programmer's view of SCC Courtesy: intel
RCCE[2][3] – A small library for many-core communication • Compact light weight communication • Research vehicle to see how message passing APIs map to many cores • One can work close to the hardware (eg manipulate the MPB) • Same program executes at all cores • Has MPI style APIs & Power mgmt APIs • Two level APIs – gory & non gory • RCCE emulator Courtesy: intel
Software Managed Cache Coherence • Implementing hardware managed cache coherence is difficult • Limited Power budget • High complexity and validation effort • Software Managed Coherence • Scales with number of cores • Multiple apps running in separate coherency domains • Dynamically reconfigurable coherency domains • Most apps are RO-shared, few RW-shared Courtesy: intel
Software Managed Cache Coherence (cont) • Shared virtual memory can be used to support coherency (like DSM) • The coherency is maintained by regions being owned exclusively • The regions can then be handed over to other core for exclusive operation • Some regions are jointly acessible • No coherence traffic until ownership is changed • Consistency guaranteed only at release/acquire points Courtesy: intel
Separated Coherency Domains Courtesy: intel
Multiple SCC Chips – Wider Coherency Courtesy: intel
References [1] J. Howard et al., “A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010, pp. 108 –109. [2] T. G. Mattson and R. F. V. der Wijngaart, “Rcce: a small library for many-core communication,” Intel Corporation, Tech. Rep., May 2010. [3] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, “The 48-core scc processor: the programmer’s view,” in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1–11. [Online]. Available: http://dx.doi.org/10.1109/SC.2010.53