270 likes | 376 Views
Intelligent Interconnects: Key to Reducing the Challenges of Complex Heterogeneous Multiprocessor Design. Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc.com. CPU. MPEG. DSP. Video I/O. DRAM Controller. Comm I/O. 3D GFX. MAC. System On Chip. SoC Architecture Trends.
E N D
Intelligent Interconnects:Key to Reducing the Challenges of Complex Heterogeneous Multiprocessor Design Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc.com DATE 2006
CPU MPEG DSP Video I/O DRAM Controller Comm I/O 3D GFX MAC System On Chip SoC Architecture Trends • Massive feature integration • Driven largely by Moore’s Law (supply) andconvergence (demand) • Continued movement of complexity to software • Distributed architectures • Higher scalability (and independence?) • Multiple Heterogeneous Processors • CPU • DSP • Special purpose (MPEG, packet, …) • Distributed DMA • Removes centralized DMA bottleneck • Simplifies driver software integration DATE 2006
Multicore SoC Architecture Options • Multi-processor cluster • Many OS’s know how to schedule, good for computing • Hard to scale past about 4 CPU’s, bad for real-time • Uniform distributed computing fabric • Highly scalable, locally efficient • Hard to program/schedule, uniformity hurts QoR • Distributed heterogeneous processing subsystems • Per-subsystem mix of hardwired and programmable • Highest scalability, high QoR • Divide-and-conquer programming model • Can deploy cluster and/or fabric approaches in subsystems DATE 2006
Complexity of Intelligent Services Evolution to Multiprocessor New Model Enables Parallel Tasks High Trial Chip Performance S/W Single Processor Model Develop Intelligent Interconnect Trial Chip Performance S/W New Data Flow Centric Model Development Time Bus Development Choose Processors Choose Processor S/W Low Has Caused Economic Shift To Outsourcing Internal Interconnect Design DATE 2006
Global Interconnect Responsibilities • Routing • Getting requests, responses and data to the desired destination • Access control • Managing contention for shared resources (ensuring QoS) • Ensuring requested access is allowed (security and protection) • Error management • Detection, reporting, and SW recovery support • Power management • Activity detection, clock and voltage removal support • Connectivity • Protocol conversion • Data width / clock frequency conversion • Spanning distance • Connecting endpoints at required frequency and latency DATE 2006
Data Flow Design ChallengesTypical Bus Style Offerings Address Few of the Real Issues Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling BusGenerator Design Re-use SW Development Variable Clock Freq. Timing Closure Voltage Isolation On-ChipBus Complex Memory Hierarchies Power Management Error Management Signal Integrity Access Security High Peripheral Count Data Width Conversion Distributed Processing Mixed Endianness Guaranteed BW QoS Pipelining Protocol Conversion DATE 2006
SMART Interconnect ApproachAddresses the Total Global Interconnect Challenge Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling Methodology & Automation Design Re-use SW Development Variable Clock Freq. Timing Closure Voltage Isolation ScalableFabrics IntelligentAgents Complex Memory Hierarchies Power Management Error Management Signal Integrity Access Security High Peripheral Count Data Width Conversion Distributed Processing Mixed Endianness Guaranteed BW QoS Pipelining Protocol Conversion DATE 2006
Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test Define Architecture First Architectural Verification ?! Create Control Logic / Arbitration Place & Route Synthesize Full SOC Acquire outside IP Create custom Interconnect Timing Extraction Integrate IP Modify IP Cores Re-synthesize Fix Hook-up Errors Synthesize IP Develop internal IP Verify Modified IP Cores Create custom Test suites SoC Design Reality…. DATE 2006
Arch Def’n Logic Des. & Verify Physical Design Fab, Assy, Test 20 Days 30Days Fab, Assy, Test 10Day Sonics Delivers Time-to-Market How is this possible? • Socket-Based Design Methodology • Highly Configurable Interconnect IP Today (Typical) Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test First Design Derivative Design(s) 12 to 18 month time & engineering savings ! DATE 2006
Tile-based Heterogeneous Multicore SoC’s • Tile – distributed, largely independent subsystem for a SoC, normally composed of: • Processing • Memory • I/O • Tile processing can be performed in fixed or programmable logic, or general-purpose or special-purpose programmable processors • Key elements of tile-based platforms: • Socket-based design • Decoupled interconnect architectures • Communication constraint capture • Firmware packaging DATE 2006
(from OMAP 5910) IVA Sources: www.ti.com, www.arm.com,www.powervr.com DATE 2006
Multicore Architecture Advantages What isneeded Avner GorenTIEPF 2004 DATE 2006
An Intelligent Interconnect Company Core 1 AXI for Seamless Connections Core 2 µP Core N AHBLegacy Support Intelligent Internal Interconnect Sonics SMART Interconnects OCP Maximizes Flexibility DSP Core Core AHB Cores APBLegacySupport I/O Memory SoCs Circa 2005 Sonics Adds Intelligent Data Flow Services DATE 2006
Hybrid topologies Full / partial cross-bar Shared bus Fully split (dual) request / response Pipelined, multi-threaded, non-blocking fabric Distributed QoS arbiter Spans cycle, frequency, and data width boundaries Supports flexible thread merging tree topologies SonicsMX Basic Architecture CPU SMX ROM DSP SRAM GFX FlashCtl. DRAMCtl. SMX T T I I I I I DATE 2006
The Intelligence is in the Agents INITIATOR SOCKETS • Agents provide… • Protocol conversion • Agent adapts to IP core • Decoupling of IP cores from fabric • Provide local, isolated environment • Data flow services • Proven technology • Over 100 million IC’s shipped so far • Agent data flow services • QoS-based arbitration • Power management • Access security • Error management • Burst, width, and command conversion I I I I I Initiator Agents (IA) Fabric Target Agents (TA) T T T T T TARGET SOCKETS DATE 2006
SonicsStudio™ • SonicsStudio • Capture SoC • Explore / validate • Synthesize • Modify physical layout • SOCCreator • Simulation • Synthesis • Floorplanning • Timing analysis • Output • mapped netlist • SystemC model DATE 2006
P P P P P P P P P P P P P P P P P P P P T T T T T T T T T T T T T T T T T T T T Partial XBarFabric 16 128 Agent SimpleSocket Regs Socket I/F Shared Bus Fabric SM SM DecouplingBuffer P P P P P P P P P P P P Fabric I/F T T T T T T T T T T T T MMU XRAM Inst.Cache ComplexSocket DMA DataCache YRAM DSPCore Mobile Handset Example P P T T S3220 T T T T CPU Tile 2D/3D GraphicsTile MPEG4 CodecTile MP3 USB 2.0 I I I I I SMX SMX T I I T I I I I I Flash Controller T T T I T DSP Tile LCDController CameraInterface DMA EmbeddedSRAM SDRAM Controller T T T T T P DATE 2006
SMX Internal Structure • Exchanges • Cross-bar (XB) • Shared Bus (SL) • Extender (EL) • Pipelining options • Registering at socket interface • Register points (RP) at agent-fabric edge • Pipeline points (PP) between exchanges • Register Target (RT) to access SMX services • Multiple socket support DATE 2006
QoS-based Arbitration • Initiator data flow threads mapped to target threads by SMX fabric • E.g. 40 data flows sharing 8 DRAM threads in a digital video system • Data flows sharing a target thread arbitrated using bandwidth weighting • Independent threads assigned to QoS level (maintained throughout SMX) • Non-blocking, multi-threaded fabric and target interfaces allow: • Higher priority requests to interleave with & respond before others • Guaranteed BW threads to minimize buffering / receive latency guarantees • Optimum DRAM efficiency DATE 2006
Arch Def’n Logic Des. & Verify Physical Design Fab, Assy, Test 20 Days 30Days Fab, Assy, Test 10Day Design Flow With Smart Interconnects How is this possible? • Socket-Based Design Methodology • Highly Configurable Interconnect IP Today (Typical) Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test First Design Derivative Design(s) 12 to 18 month time & engineering savings ! DATE 2006
Power Management System I0 I1 APM • Active status indication configurable on a per-socket basis • Unit Power Manager (UPM) initiates power down request when idle for some time • Based on system-defined policy • Can monitor subset of cores & SMX • UPM initiates power up request when attached master is active • Might be initiator or other SMX • Supports chaining – Unit Power Managers can simply OR active flags for all incoming signaling • Simplifies design of APM Active Unit Pwr Mgr Active Active Active Down_req IA IA Down_ok SMX I2 TA TA Active Active Unit Pwr Mgr Active Active Down_req T0 IA IA Down_ok SMX TA TA Active T1 T2 DATE 2006
Initiator Threads Initiator Roles ... ... ... ... Protection Regions ... ... ... ... ... Access Security • Optional multi-region firewall • Per-target, re-programmable • Layered architecture supports rich set of security domains with variable region sizes • Access permissions determined per role and access type • Flexible security error caching and reporting MAddr, MAddrSpace L3 CAM L2 CAM L1CAM L0 CAM 1 L0 permissions L2permissions L1permissions L3permissions L0 valid L3 valid L2 valid L1 valid priority rolepermissions MReqInfo role writepermissions read permissions group ROM Init thread ID group roleOK read OK write OK MCmd Access OK TARGET CORE DATE 2006
Error Management • Detects a wide variety of error conditions • Bad addresses and illegal commands • Timeouts (initiator and target) • Security violations • Aggregates errors (response and sideband) from IP cores • Logs errors in agents for software interrogation • Status bits (i.e. have detected error, multiple errors) • Identification of initiator/address that caused error • Reports errors as desired • In-band via error responses • Sideband via SError/MError, interrupts, etc. • Supports IP core isolation and reset for error recovery DATE 2006
P P P P P P P P P P P P P P P P P P P P T T T T T T T T T T T T T T T T T T T T Partial XBarFabric 16 128 Agent SimpleSocket Regs Socket I/F Shared Bus Fabric SM SM DecouplingBuffer P P P P P P P P P P P P Fabric I/F T T T T T T T T T T T T MMU XRAM Inst.Cache ComplexSocket DMA DataCache YRAM DSPCore Multicore Mobile Handset Example P P T T S3220 T T T T CPU Tile 2D/3D GraphicsTile MPEG4 CodecTile MP3 USB 2.0 I I I I I SMX SMX T I I T I I I I I Flash Controller T T T I T DSP Tile LCDController CameraInterface DMA EmbeddedSRAM SDRAM Controller T T T T T P DATE 2006
Summary • Multicore SoC designs are already common • And are largely heterogeneous • Current approaches abstract each CPU into local tile • Increases independence of firmware for better scaling • Intelligent interconnects are key to multicore SoC’s • Non-blocking internal fabrics keep processors fed with data • Agent-based data flow services key to managing heterogeneity • SonicsMX demonstrates the value of centralized data flow services DATE 2006
Thank You Nokia Sony Hughes Network Systems Over 100 million Sonics enabled chips shipped Cisco Samsung Dell Toshiba DATE 2006
Questions? Thank You DATE 2006