1 / 27

Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc

Intelligent Interconnects: Key to Reducing the Challenges of Complex Heterogeneous Multiprocessor Design. Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc.com. CPU. MPEG. DSP. Video I/O. DRAM Controller. Comm I/O. 3D GFX. MAC. System On Chip. SoC Architecture Trends.

Download Presentation

Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Interconnects:Key to Reducing the Challenges of Complex Heterogeneous Multiprocessor Design Jeff Haight Sonics, Inc. Mountain View, CA USA jhaight@sonicsinc.com DATE 2006

  2. CPU MPEG DSP Video I/O DRAM Controller Comm I/O 3D GFX MAC System On Chip SoC Architecture Trends • Massive feature integration • Driven largely by Moore’s Law (supply) andconvergence (demand) • Continued movement of complexity to software • Distributed architectures • Higher scalability (and independence?) • Multiple Heterogeneous Processors • CPU • DSP • Special purpose (MPEG, packet, …) • Distributed DMA • Removes centralized DMA bottleneck • Simplifies driver software integration DATE 2006

  3. Multicore SoC Architecture Options • Multi-processor cluster • Many OS’s know how to schedule, good for computing • Hard to scale past about 4 CPU’s, bad for real-time • Uniform distributed computing fabric • Highly scalable, locally efficient • Hard to program/schedule, uniformity hurts QoR • Distributed heterogeneous processing subsystems • Per-subsystem mix of hardwired and programmable • Highest scalability, high QoR • Divide-and-conquer programming model • Can deploy cluster and/or fabric approaches in subsystems DATE 2006

  4. Complexity of Intelligent Services Evolution to Multiprocessor New Model Enables Parallel Tasks High Trial Chip Performance S/W Single Processor Model Develop Intelligent Interconnect Trial Chip Performance S/W New Data Flow Centric Model Development Time Bus Development Choose Processors Choose Processor S/W Low Has Caused Economic Shift To Outsourcing Internal Interconnect Design DATE 2006

  5. Global Interconnect Responsibilities • Routing • Getting requests, responses and data to the desired destination • Access control • Managing contention for shared resources (ensuring QoS) • Ensuring requested access is allowed (security and protection) • Error management • Detection, reporting, and SW recovery support • Power management • Activity detection, clock and voltage removal support • Connectivity • Protocol conversion • Data width / clock frequency conversion • Spanning distance • Connecting endpoints at required frequency and latency DATE 2006

  6. Data Flow Design ChallengesTypical Bus Style Offerings Address Few of the Real Issues Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling BusGenerator Design Re-use SW Development Variable Clock Freq. Timing Closure Voltage Isolation On-ChipBus Complex Memory Hierarchies Power Management Error Management Signal Integrity Access Security High Peripheral Count Data Width Conversion Distributed Processing Mixed Endianness Guaranteed BW QoS Pipelining Protocol Conversion DATE 2006

  7. SMART Interconnect ApproachAddresses the Total Global Interconnect Challenge Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling Methodology & Automation Design Re-use SW Development Variable Clock Freq. Timing Closure Voltage Isolation ScalableFabrics IntelligentAgents Complex Memory Hierarchies Power Management Error Management Signal Integrity Access Security High Peripheral Count Data Width Conversion Distributed Processing Mixed Endianness Guaranteed BW QoS Pipelining Protocol Conversion DATE 2006

  8. Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test Define Architecture First Architectural Verification ?! Create Control Logic / Arbitration Place & Route Synthesize Full SOC Acquire outside IP Create custom Interconnect Timing Extraction Integrate IP Modify IP Cores Re-synthesize Fix Hook-up Errors Synthesize IP Develop internal IP Verify Modified IP Cores Create custom Test suites SoC Design Reality…. DATE 2006

  9. Arch Def’n Logic Des. & Verify Physical Design Fab, Assy, Test 20 Days 30Days Fab, Assy, Test 10Day Sonics Delivers Time-to-Market How is this possible? • Socket-Based Design Methodology • Highly Configurable Interconnect IP Today (Typical) Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test First Design Derivative Design(s) 12 to 18 month time & engineering savings ! DATE 2006

  10. Tile-based Heterogeneous Multicore SoC’s • Tile – distributed, largely independent subsystem for a SoC, normally composed of: • Processing • Memory • I/O • Tile processing can be performed in fixed or programmable logic, or general-purpose or special-purpose programmable processors • Key elements of tile-based platforms: • Socket-based design • Decoupled interconnect architectures • Communication constraint capture • Firmware packaging DATE 2006

  11. (from OMAP 5910) IVA Sources: www.ti.com, www.arm.com,www.powervr.com DATE 2006

  12. Multicore Architecture Advantages What isneeded Avner GorenTIEPF 2004 DATE 2006

  13. An Intelligent Interconnect Company Core 1 AXI for Seamless Connections Core 2 µP Core N AHBLegacy Support Intelligent Internal Interconnect Sonics SMART Interconnects OCP Maximizes Flexibility DSP Core Core AHB Cores APBLegacySupport I/O Memory SoCs Circa 2005 Sonics Adds Intelligent Data Flow Services DATE 2006

  14. Hybrid topologies Full / partial cross-bar Shared bus Fully split (dual) request / response Pipelined, multi-threaded, non-blocking fabric Distributed QoS arbiter Spans cycle, frequency, and data width boundaries Supports flexible thread merging tree topologies SonicsMX Basic Architecture CPU SMX ROM DSP SRAM GFX FlashCtl. DRAMCtl. SMX T T I I I I I DATE 2006

  15. The Intelligence is in the Agents INITIATOR SOCKETS • Agents provide… • Protocol conversion • Agent adapts to IP core • Decoupling of IP cores from fabric • Provide local, isolated environment • Data flow services • Proven technology • Over 100 million IC’s shipped so far • Agent data flow services • QoS-based arbitration • Power management • Access security • Error management • Burst, width, and command conversion I I I I I Initiator Agents (IA) Fabric Target Agents (TA) T T T T T TARGET SOCKETS DATE 2006

  16. SonicsStudio™ • SonicsStudio • Capture SoC • Explore / validate • Synthesize • Modify physical layout • SOCCreator • Simulation • Synthesis • Floorplanning • Timing analysis • Output • mapped netlist • SystemC model DATE 2006

  17. P P P P P P P P P P P P P P P P P P P P T T T T T T T T T T T T T T T T T T T T Partial XBarFabric 16 128 Agent SimpleSocket Regs Socket I/F Shared Bus Fabric SM SM DecouplingBuffer P P P P P P P P P P P P Fabric I/F T T T T T T T T T T T T MMU XRAM Inst.Cache ComplexSocket DMA DataCache YRAM DSPCore Mobile Handset Example P P T T S3220 T T T T CPU Tile 2D/3D GraphicsTile MPEG4 CodecTile MP3 USB 2.0 I I I I I SMX SMX T I I T I I I I I Flash Controller T T T I T DSP Tile LCDController CameraInterface DMA EmbeddedSRAM SDRAM Controller T T T T T P DATE 2006

  18. SMX Internal Structure • Exchanges • Cross-bar (XB) • Shared Bus (SL) • Extender (EL) • Pipelining options • Registering at socket interface • Register points (RP) at agent-fabric edge • Pipeline points (PP) between exchanges • Register Target (RT) to access SMX services • Multiple socket support DATE 2006

  19. QoS-based Arbitration • Initiator data flow threads mapped to target threads by SMX fabric • E.g. 40 data flows sharing 8 DRAM threads in a digital video system • Data flows sharing a target thread arbitrated using bandwidth weighting • Independent threads assigned to QoS level (maintained throughout SMX) • Non-blocking, multi-threaded fabric and target interfaces allow: • Higher priority requests to interleave with & respond before others • Guaranteed BW threads to minimize buffering / receive latency guarantees • Optimum DRAM efficiency DATE 2006

  20. Arch Def’n Logic Des. & Verify Physical Design Fab, Assy, Test 20 Days 30Days Fab, Assy, Test 10Day Design Flow With Smart Interconnects How is this possible? • Socket-Based Design Methodology • Highly Configurable Interconnect IP Today (Typical) Architectural Definition Logic Design & Verification Physical Design Fab, Assy, Test First Design Derivative Design(s) 12 to 18 month time & engineering savings ! DATE 2006

  21. Power Management System I0 I1 APM • Active status indication configurable on a per-socket basis • Unit Power Manager (UPM) initiates power down request when idle for some time • Based on system-defined policy • Can monitor subset of cores & SMX • UPM initiates power up request when attached master is active • Might be initiator or other SMX • Supports chaining – Unit Power Managers can simply OR active flags for all incoming signaling • Simplifies design of APM Active Unit Pwr Mgr Active Active Active Down_req IA IA Down_ok SMX I2 TA TA Active Active Unit Pwr Mgr Active Active Down_req T0 IA IA Down_ok SMX TA TA Active T1 T2 DATE 2006

  22. Initiator Threads Initiator Roles ... ... ... ... Protection Regions ... ... ... ... ... Access Security • Optional multi-region firewall • Per-target, re-programmable • Layered architecture supports rich set of security domains with variable region sizes • Access permissions determined per role and access type • Flexible security error caching and reporting MAddr, MAddrSpace L3 CAM L2 CAM L1CAM L0 CAM 1 L0 permissions L2permissions L1permissions L3permissions L0 valid L3 valid L2 valid L1 valid priority rolepermissions MReqInfo role writepermissions read permissions group ROM Init thread ID group roleOK read OK write OK MCmd Access OK TARGET CORE DATE 2006

  23. Error Management • Detects a wide variety of error conditions • Bad addresses and illegal commands • Timeouts (initiator and target) • Security violations • Aggregates errors (response and sideband) from IP cores • Logs errors in agents for software interrogation • Status bits (i.e. have detected error, multiple errors) • Identification of initiator/address that caused error • Reports errors as desired • In-band via error responses • Sideband via SError/MError, interrupts, etc. • Supports IP core isolation and reset for error recovery DATE 2006

  24. P P P P P P P P P P P P P P P P P P P P T T T T T T T T T T T T T T T T T T T T Partial XBarFabric 16 128 Agent SimpleSocket Regs Socket I/F Shared Bus Fabric SM SM DecouplingBuffer P P P P P P P P P P P P Fabric I/F T T T T T T T T T T T T MMU XRAM Inst.Cache ComplexSocket DMA DataCache YRAM DSPCore Multicore Mobile Handset Example P P T T S3220 T T T T CPU Tile 2D/3D GraphicsTile MPEG4 CodecTile MP3 USB 2.0 I I I I I SMX SMX T I I T I I I I I Flash Controller T T T I T DSP Tile LCDController CameraInterface DMA EmbeddedSRAM SDRAM Controller T T T T T P DATE 2006

  25. Summary • Multicore SoC designs are already common • And are largely heterogeneous • Current approaches abstract each CPU into local tile • Increases independence of firmware for better scaling • Intelligent interconnects are key to multicore SoC’s • Non-blocking internal fabrics keep processors fed with data • Agent-based data flow services key to managing heterogeneity • SonicsMX demonstrates the value of centralized data flow services DATE 2006

  26. Thank You Nokia Sony Hughes Network Systems Over 100 million Sonics enabled chips shipped Cisco Samsung Dell Toshiba DATE 2006

  27. Questions? Thank You DATE 2006

More Related