240 likes | 423 Views
“Single-chip Cloud Computer” An experimental many-core processor from Intel Labs. Xiaocheng Zhou Intel Labs China. Source: electronic visualization lab University of Illinois. What is Tera-scale?. TIPs of compute power operating on Tera-bytes of data. Entertainment. TIPS.
E N D
“Single-chip Cloud Computer”An experimental many-core processor from Intel Labs Xiaocheng Zhou Intel Labs China
Source: electronic visualization lab University of Illinois What is Tera-scale? TIPs of compute power operating on Tera-bytes of data Entertainment TIPS Learning & Travel RMS Personal Media Creation and Management GIPS 3D & Video Performance Tera-scale Mult- Media MIPS Multi-core Text Single-core KIPS Health Kilobytes Megabytes Gigabytes Terabytes Dataset Size http://techresearch.intel.com/articles/Tera-Scale/1421.htm
Performance Scaling Challenges Energy Efficiency • Design • Complexity • Programming Strategy • Emerging • Applications
Cloud Computing Today Cloud datacenters: • 1000s of networked computers • Millions of threads & petabytes of data Opportunity: • Lower power, higher density via integration • Greater efficiency and better programmability • Example: Intel’s Open Cirrus testbed • Intel Labs Pittsburgh Future: Many-core Processor?
Single-chip Cloud Computer (SCC) • Experimental many-core CPU on 45 nm Hi-K metal-gate silicon • 48 IA-compatible cores • Network of 2-core nodes mimics cloud computing at chip level • Fine-grained power management scales from 25-125W • Supports proven, highly parallel “scale-out” programming models
Inside the SCC Dual-core SCDC Tile Core 1 L2 Cache 24 Tiles 24 Routers 48 IA cores MC MC MC MC ROUTER Message Buffer ROUTER 1TILE R R R R MEMORY CONTROLLER • 2D mesh network • 4 Integrated DDR3 memory controllers (64GB addressable) L2 Cache Core 2 R R R
On-die Interconnect • Architecture • 6x4 2D Mesh NOC • 16B wide data links + 2B sideband • 8 Virtual Channels in 2 classes • Fixed (X-Y) routing • Performance • Target freq: 2GHz @ 1.1V • Link Bandwidth 64GB/s • 4 cycle latency • Power Management • Independent Frequency & Voltage control • Sleep mode, clock gating, low power RF
Memory Architecture • Memory • Up to 64GB DDR3 via 4 memory controllers @ 21.3GB/s • 16KB SRAM in each tile as Message Passing Buffer (MPB) • Caching • 32KB L1 per core (16KB I,D), 12MB L2 cache (256KB/core) • No HW cache-coherent shared memory • Addressing • Core physical to system physical addresses in 16MB sections • Memory mapped configuration & control registers
Address Translation:From Core Address to System Address Look Up Table (LUT) Core Physical Address Space Core Physical Address Space Physical-Physical Mapping Physical-Physical Mapping System Physical Address Space
Message Passing on SCC • Regions of memory mapped to multiple cores • Message Passing Buffer (MPB) for small fast messages • Larger buffers in off-die memory • Message Passing Data Type (MPDT) • R/W bypass L2 cache – tagged in L1 as MPDT • New instruction to selectively invalidate MPDT lines • Read/Write to other core’s MPB on-die • Synchronize through special atomic register bits • Core-core asynchronous interrupts • High-level API for applications – “RCCE” • One-sided communication (Get, Put, Send, Recv) • MPB allocation, synchronization
Improving Energy EfficiencyFine-grain, software-controlled power management 8 voltage and 28 frequency islands • Each tile can run at a different frequency • 6 banks of four tiles can run at different voltages • Also independent V&F control for I/O network & MCs V2 V1 V3 Fn Fn Memory Controller Memory Controller Fn Fn Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile R R R R R R R R R R R R R R R R R R R R R R R R V6 V4 V5 Memory Controller Memory Controller
SCC “Chipset” • System Interface FPGA • Connects to SCC Mesh interconnect • IO capabilities like PCIe, Ethernet & SATA • Bitstream loaded by BMC • Board Management Controller (BMC) • JTAG interface for Clocking, Power etc. • USB Stick to hold FPGA bitstream • Network interface for User intercation via Telnet • Status monitoring
Software Environment • SCC Software • Customized Linux • Bare Metal • RCCE communication & power management • Tools • Selected Intel tools (e.g., icc, ifort, ...) • Microsoft research release of SCC extensions to Visual Studio • Management Console PC Software • PCIe driver with integrated TCP/IP driver • Programming API for communication with SCC platform • GUI for interaction with SCC platform • Command line tools for interaction with SCC platform
RCCE Communication API • A compact, lightweight communication environment. • SCC and RCCE were designed together side by side: • … a true HW/SW co-design project. • A research vehicle to understand how message passing APIs map onto many core chips. • For experienced parallel programmers willing to work close to the hardware. • Static SPMD Execution Model: • identical UEs created together when a program starts (this is a standard approach familiar to message passing programmers)
RCCE power management emphasizes safe control: V/GHz changed together within each 4-tile (8-core) power domain. A Master core sets V + GHz for all cores in domain. RCCE_istep_power(): steps up or down V + GHz, where GHz is max for selected voltage. RCCE_wait_power(): returns when power change is done RCCE_step_frequency(): steps up or down only GHz Power management latencies V changes: Very high latency, O(Million) cycles. GHz changes: Low latency, O(few) cycles. RCCE Power Management API
sccGui for debugging Modify config registers Read system memory
sccBoot & sccReset • sccBoot:A command-line tool that allows to boot Linux on selected cores and to check the status (“which cores are currently booted”). • sccReset:A command-line tool that allows to reset selected SCC cores.
sccKonsole • Regular konsole, with automatic login to selected cores. • Enables broadcasting amongst shells.
MARC - Many-core Application Research Community • Worldwide research partnership program with academia & industry • Providing access to SCC for many-core programming research • Overwhelming interest - ~200 research proposals received • SCC datacenter is online - Community website up and running • http://communities.intel.com/community/marc