330 likes | 359 Views
Explore the innovative approach of active decoupling in multicore SoC design for improved speed, power, and area efficiency. Learn about advanced interconnect fabrics and intelligent agents revolutionizing system-on-a-chip performance.
E N D
Intelligent Interconnects for Multicore SoC’sDrew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006
Agenda • SoC Background • Interconnect Architecture • Application Areas OCIN06: Intelligent Interconnects for Multicore SoC’s
The Goal Create a system-on-a-chip, comprising tenmilliongates, that satisfies rapidly-evolving market requirements for: • Speed • Power • Area • Application Performance • Time to Market Using the minimum resourceswith maximum predictability OCIN06: Intelligent Interconnects for Multicore SoC’s
CPU MPEG DSP Video I/O DRAM Controller Comm I/O 3D GFX MAC System On Chip SoC Architecture Trends • Massive feature integration • Driven largely by Moore’s Law (supply) and convergence (demand) • Continued movement of complexity to software • Distributed architectures • Higher scalability (and independence?) • Multiple processors • CPU • DSP • Special purpose (MPEG, packet, …) • Distributed DMA • Removes centralized DMA bottleneck • Simplifies driver software integration OCIN06: Intelligent Interconnects for Multicore SoC’s
Why SoC’s Are Difficult • Improvement by feature integration has been practiced for decades • Why is SoC any different? • Benefits of an SoC • Higher performance • Smaller footprint • Lower power • Lower cost • Size, power, and cost benefits derive from sharing • Control software on CPU (interrupts, etc.) • Memory resources (on-chip and off-chip) – single external DRAM • Building predictable systems with so much sharing is hard • Dozens to hundreds of interrupt sources • Memory bandwidth bottlenecks + lots of real-time traffic stress Unified Memory Architecture (UMA) OCIN06: Intelligent Interconnects for Multicore SoC’s
Limits of Tightly Coupled Design • Tightly coupled design has been dominant • Assumes largely synchronous, instantaneous and free communication • Widely practiced, and supported in design flows • BUT • Delivering clocks is problematic • Wire delay is dominant • Routing area can cost more than gates • Too many constraints, from too many blocks • Cannot afford lowest common denominator design OCIN06: Intelligent Interconnects for Multicore SoC’s
16 128 CPU DMA DRAM Controller Master Slave Master Slave Core Function Core Function Core Function Communication Communication Communication Socket Agent BusAdapter BusAdapter Agent BusAdapter Agent BusAdapter Agent Slave Slave Master Master Internal Fabric SMART Interconnect Our Approach: Active Decoupling • Separation • Abstraction • Optimization • Independence DMA DRAM Controller CPU Network OCIN06: Intelligent Interconnects for Multicore SoC’s
Performance Higher Abstraction Identification Transfer Protocol Signaling Electrical Layers of Decoupling • Degree of blocking • QoS • Addressing • Source identification • Bursting, pipelining • Threading • Data, address, event widths • Handshaking / flow control • Signal timing and capacitance • Clock frequency OCIN06: Intelligent Interconnects for Multicore SoC’s
Tiles Networks FunctionalCores InterconnectCores Blocks Buses Gates Wires n Evolution of Design Abstractions TIME Need to be here! Level of Abstraction Most designs are here Abstraction minimizes the number of objectsthat the designer must manage OCIN06: Intelligent Interconnects for Multicore SoC’s
100M+Gates PRESENT FUTURE Technology Push Tiles Networks 5M+Gates Cores Interconnects Log(GATE COUNT) DEVELOPMENT TIME & COST Blocks Buses Abstraction Enables Higher Complexity THE PAST p OCIN06: Intelligent Interconnects for Multicore SoC’s
The OCP Socket • Owned/advanced by OCP-IP (www.ocpip.org) • Over 150 member companies • Interconnect-neutral • Defines data flow, control flow, test signaling • Highly configurable to match core needs • Simple Master/Slave request/response protocols • With many options • Fully synchronous, point-to-point • Optional pipelining, bursting, threading • Flexible handshaking and other flow control OCIN06: Intelligent Interconnects for Multicore SoC’s
Agenda • SoC Background • Interconnect Architecture • Advanced Fabrics • Intelligent Agents • Application Areas OCIN06: Intelligent Interconnects for Multicore SoC’s
SoC Data Flow Requirements • Connect dozens of initiators to several memories • While meeting a wide range of latency and throughput requirements • Connect processors and DMA to control ports and peripherals • With predictably low latencies • Example SoC requirements (interconnect view): OCIN06: Intelligent Interconnects for Multicore SoC’s
Interconnect Fabric Options • SoC data flow requirements must be satisfied by internal interconnect fabric • Big challenge in current SoC designs! • Choices in interconnect fabric design • Unified vs. split transactions • Shared vs. separate physical links • Combinational vs. pipelined • Single vs. multiple outstanding transactions (transaction pipelining) • In-order vs. out-of-order completion and response • Blocking vs. non-blocking flow control OCIN06: Intelligent Interconnects for Multicore SoC’s
Blocking vs. Non-blocking Flow Control • Sharing in SoC’s creates many opportunities for contention • Arbitration determines who wins • Flow control determines when the winner gets to go • Blocking flow control systems allow resource shortages along some paths to prevent other paths from progressing • Non-blocking flow control systems ensure that points of sharing never stall if any data flow could progress • Locallynon-blocking flow control (aka virtual channels) improves efficiency and allows more resource sharing • End-to-endnon-blocking flow control offers greater predictability • Provides basis for QoS guarantees Our Approach OCIN06: Intelligent Interconnects for Multicore SoC’s
Hybrid topologies Full / partial cross-bar Shared bus Pipelined, multi-threaded, non-blocking fabric Fully split (dual) request / response Distributed QoS arbiter Spans cycle, frequency, and data width boundaries Supports flexible thread merging tree topologies SonicsMX Basic Architecture CPU SMX ROM DSP SRAM GFX FlashCtl. DRAMCtl. SMX T T I I I I I OCIN06: Intelligent Interconnects for Multicore SoC’s
Global Interconnect Responsibilities • Routing • Getting requests, responses and data to the desired destination • Access control • Managing contention for shared resources (ensuring QoS) • Ensuring requested access is allowed (security and protection) • Error management • Detection, reporting, and SW recovery support • Power management • Activity detection, clock and voltage removal support • Connectivity • Protocol conversion • Data width / clock frequency conversion • Spanning distance • Connecting endpoints at required frequency and latency OCIN06: Intelligent Interconnects for Multicore SoC’s
The Intelligence is in the Agents • Agents provide… • Protocol conversion • Agent adapts to IP core • Decoupling of IP cores from fabric • Provide local, isolated environment • Layered services • Proven technology • Over 100 million IC’s shipped so far • Agent services • Power management • Security management • Error management • QoS • Burst, width, and command conversion INITIATOR SOCKETS I I I I I Initiator Agents (IA) Fabric Target Agents (TA) T T T T T TARGET SOCKETS OCIN06: Intelligent Interconnects for Multicore SoC’s
Sonics Interconnect Products OCIN06: Intelligent Interconnects for Multicore SoC’s
Interconnect Performance • Many standard peak measurements pretty meaningless when considering SoC interconnects • Bandwidth is cheap (wires are cheaper than pins) • Usable bandwidth mostly dependent on attached cores… • … and application scenario • Sonics’ products engineered to span very wide range of operating points • Latencies: 0-10 clock cycles • Bandwidth: limited only by targets • Frequency: keep up with DRAM • IP core count: 10-100+ • No process-specific limitations • Standard ASIC-style design flow OCIN06: Intelligent Interconnects for Multicore SoC’s
Product Deliverables • Configuration tools • Aid in capture, checking, generation, and verification of Sonics’ IP • Configured Register Transfer Language (RTL) code • “Virtual hardware” input to logic synthesis and place & route tools • Configured verification environment • Tests customer configuration of Sonics’ IP • Logic synthesis scripts & floorplan interface • Bridges to physical design domain • Electronic System Level (ESL) models (in SystemC) • Helps architectural exploration and firmware development • Analysis tools and models • Aids customer in building and analyzing prototypes of SoC OCIN06: Intelligent Interconnects for Multicore SoC’s
Design Issues Addressed By Our Approach Perf. Verification Virtual Prototyping Parallel IP Creation Arch. Modeling Methodology & Automation Design Re-use SW Development Variable Clock Freq. Timing Closure Voltage Isolation ScalableFabrics IntelligentAgents Complex Memory Hierarchies Power Management Error Management Signal Integrity Access Security High Peripheral Count Data Width Conversion Distributed Processing Mixed Endianness Guaranteed BW QoS Pipelining Protocol Conversion OCIN06: Intelligent Interconnects for Multicore SoC’s
IPCore SOC Tightly Coupled Emulator? First Integration Re-verified Start “Hello World” Core Interaction Tapeout! Time IPCore Design SOC Functional Verif. Sonics Synth./Timing Verif. Core Interaction Integration Performance Verif. “Hello World” Tapeout! Start Re-verified Time Design Timeline Benefits of Our Approach OCIN06: Intelligent Interconnects for Multicore SoC’s
Agenda • SoC Background • Interconnect Architecture • Application Areas OCIN06: Intelligent Interconnects for Multicore SoC’s
Inside a Tile-based Multicore SoC • Example: TI OMAP 2420 • Heterogeneous processors • General-purpose CPU (e.g. ARM 11) • DSP (e.g. TI C5x) • 2D/3D graphics (e.g. Imagination PowerVR MBX) • Video accelerator (e.g. MPEG4 codec) • Multiple levels of shared memory • External DRAM and Flash • Internal SRAM and ROM • Very complex peripheral subsystems • Security management (content, service, stability) • Aggressive power management OCIN06: Intelligent Interconnects for Multicore SoC’s
(from OMAP 5910) IVA Sources: www.ti.com, www.arm.com,www.powervr.com OCIN06: Intelligent Interconnects for Multicore SoC’s
Multicore Architecture Advantages What isneeded Avner GorenTIEPF 2004 OCIN06: Intelligent Interconnects for Multicore SoC’s
P P P P P P P P P P P P P P P P P P P P T T T T T T T T T T T T T T T T T T T T Partial XBarFabric 16 128 Agent SimpleSocket Regs Socket I/F Shared Bus Fabric SM SM DecouplingBuffer P P P P P P P P P P P P Fabric I/F T T T T T T T T T T T T MMU XRAM Inst.Cache ComplexSocket DMA DataCache YRAM DSPCore Application Processor Interconnect Example P P T T S3220 T T T T CPU Tile 2D/3D GraphicsTile MPEG4 CodecTile MP3 USB 2.0 I I I I I SMX SMX T I I T I I I I I Flash Controller T T T I T DSP Tile LCDController CameraInterface DMA EmbeddedSRAM SDRAM Controller T T T T T OCIN06: Intelligent Interconnects for Multicore SoC’s P
SoC Application Requirements (1/2) OCIN06: Intelligent Interconnects for Multicore SoC’s
SoC Application Requirements (2/2) OCIN06: Intelligent Interconnects for Multicore SoC’s
Challenges With Textbook NoC • SoC applications do not offer lots of network-level concurrency • Many are dominated by a single DRAM target • NoC packetization and serialization overhead too high • Communication (wires) vs. computation (router, NI) cost trade-offs different for SoC • NoC latency is unacceptable for current processors • Should improve as reality sets in… OCIN06: Intelligent Interconnects for Multicore SoC’s
Research Opportunities • Sonics is interested in facilitating research in the following areas: • NoC benchmarking (via OCP-IP) • Distributed, heterogeneous cache and I/O coherence • Performance constraint capture • Static performance analysis (SPA) • Implications of 3D packaging on SoC architecture • Interested? • Please contact me: wingard@sonicsinc.com OCIN06: Intelligent Interconnects for Multicore SoC’s
Summary • Active decoupling is essential for managing heterogeneous designs • Non-blocking interconnect fabrics offer higher efficiency and better QoS • Intelligent agents centralize key services • High volume multicore SoC applications need advanced on-chip interconnects now • Please contact me for references Thank You! OCIN06: Intelligent Interconnects for Multicore SoC’s