200 likes | 388 Views
Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs. Alex Chao March 18, 2009. Video SoCs: Growing Fast in Complexity. Video SoCs face growing complexity and need much more memory bandwidth More and more features Advanced trick mode, 2D/3D GFX, Security (DRM)
E N D
Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs Alex Chao March 18, 2009
Video SoCs: Growing Fast in Complexity • Video SoCs face growing complexity and need much more memory bandwidth • More and more features • Advanced trick mode, 2D/3D GFX, Security (DRM) • HD is now the standard resolution • Latest and greatest algorithms • State of the art video compression standards: H.264, VC-1, AVS • Image quality improvements: Multi-scaling, noise reduction, alpha blending, multi-plane video composing • Features and performance place heavy burden on memory subsystems • Increasing software burden requires more platform stability across architecture generations, product lines and product derivatives Multichannel DRAM Subsystems for Video SOCs
Example of a Video SOC (current generation) Basic software stack OSD Transport Demux Video back-end H.264 MP @ L3 decoder Host CPU INTERCONNECT INTERCONNECT INTERCONNECT Memory Subsystem Peripherals 1.5 GB/s – 2 GB/s Multichannel DRAM Subsystems for Video SOCs
Example of a Video SOC (next generation) dual HD stream decoding Full software stack H.264 HiP @ L4.1 decoder 2D/3D GFX Transport Demux Display processing H.264 HiP @ L4.1 decoder Host CPU Audio DSP INTERCONNECT Video out INTERCONNECT INTERCONNECT Memory Subsystem #2 Memory Subsystem #1 Peripherals 8 GB/s – 11 GB/s Multichannel DRAM Subsystems for Video SOCs
Concurrency in Video SoCs Video SoCs process lots of data in parallel, but communicate… Transport demux Graphic Engines Audio Decode H.264 Decode Video Out Multichannel DRAM Subsystems for Video SOCs
Concurrency in Video SoCs DRAM Transport demux Graphic Engines Audio Decode H.264 Decode Video Out Multichannel DRAM Subsystems for Video SOCs
Concurrency in Video SoCs DRAM Transport demux Audio Decode H.264 Decode Video Out Multichannel DRAM Subsystems for Video SOCs
DRAM Evolution: DRAM Burst Sizes 10 70 64 Bytes 60 8 DDR3 50 Optimal DDR3 burst size EXCEEDS32 Bytes 6 DDR2 40 DRAM Words (BL) or DDR Width (Bytes) Minimum DRAM Burst (Bytes) 30 4 DDR 20 2 10 8 Bytes 0 0 2003 2004 2005 2006 2007 2008 2009 DDR1 BL DDR2 BL DDR3 BL DDR Width (Bytes) DRAM Burst DDR3 Transition Reduces DRAM Efficiency Multichannel DRAM Subsystems for Video SOCs
SonicsSX SMART Interconnect SonicsSX SMART Interconnect SonicsMX SMART Interconnect SonicsMX SMART Interconnect SonicsMX SMART Interconnect SonicsMX SMART Interconnect MME MME MME MME CPU DSP CPU DSP DSP CPU DSP CPU MemMax MemMax MemMax 32 32 64 DDR3 DDR3 DDR3 DDR3 DDR3 64Mx16x4 DDR3 DDR3 DDR3 64Mx16x2 64Mx16x2 Multichannel Optimizes DRAM Efficiency From Single toMultichannel Re-gain lostefficiency Source: Customer (HDTV) System Dataflow Multichannel DRAM Subsystems for Video SOCs
Multichannel Is Not Easy! Major issues: • Load balancing • Must balance memory traffic evenly among channels • Maintaining throughput • Multiple channels cause throughput/ordering problems for pipelined memories This means software and IP cores must manage multiple memory regions and be multi-channel-aware Address 2 Channels 2 Channels 4 Channels Space No Interleave Interleaved Interleaved Application View Region 1 Region 1 Region 1 Region 1 Hole 1 Hole 1 Hole 1 Hole 1 1 1 2 2 Ch . 1 1 3 2 4 Region 2 Region 2 Region 2 Region 2 1 1 2 2 Ch . 2 1 3 2 4 Region 3 Region 3 Region 3 Region 3 Hole 2 Hole 2 Hole 2 Hole 2 Multichannel DRAM Subsystems for Video SOCs
Architecture Challenges • Maximum memory efficiency and memory performance can be achieved with symmetric and balanced memory channels • Asymmetric and/or unbalanced channels often leads to overdesign in order to achieve the performance requirements • Slight architecture modifications require rebalancing of channels • Software, address map, product specification changes • Developing new applications means load re-balancing • Time consuming and risky A shared/balanced memory resource avoids overdesign Multichannel DRAM Subsystems for Video SOCs
Seamless Multichannel Transition Application View Physical Organization Multichannel DRAM Subsystems for Video SOCs
Interleaved Multi-channel Technology (IMT) - core technology of SonicsSX Seamless Multichannel Transition Application View Physical Organization Multichannel DRAM Subsystems for Video SOCs
8000 Channel_0 Channel_1 7000 6000 5000 Number of words 4000 3000 2000 1000 0 0 3E+06 5E+06 8E+06 1E+07 1E+07 2E+07 2E+07 2E+07 2E+07 3E+07 3E+07 3E+07 3E+07 4E+07 4E+07 4E+07 4E+07 Time Automatic Load Balancing with High Efficiency Well Balanced Channels Delivers High Memory Performance Automatic load balancing achieved with Sonics’ IMT Multichannel DRAM Subsystems for Video SOCs
2D Bursts, Address Tiling & Multichannel • Two-dimensional block bursts • 2D transaction using a single read/write command • Popular for HD video and graphics Multichannel DRAM Subsystems for Video SOCs
2D Bursts, Address Tiling & Multichannel • Two-dimensional block bursts • 2D transaction using a single read/write command • Popular for HD video and graphics • Address tiling • Rearrange DRAMaddress organization toexploit 2D locality • Avoids page misses Multichannel DRAM Subsystems for Video SOCs
2D Bursts, Address Tiling & Multichannel • Two-dimensional block bursts • 2D transaction using a single read/write command • Popular for HD video and graphics • Address tiling • Rearrange DRAMaddress organization toexploit 2D locality • Avoids page misses • Channels dividebuffer into columns • SonicsSX splits 2Dbursts that crosschannel edges Multichannel DRAM Subsystems for Video SOCs
Multichannel Interleaving • Interleaving support requires splitting traffic and delivery to the proper channel • Option 1: Splitting in memory scheduler/controller • Creates performance bottleneck • Hard to scale past two channels • Option 2: Splitting in the Interconnect (Sonics’ IMT approach) • Fully-distributed architecture enables scalability • Network overlaps channel accesses to maximize throughput • Optimized protocols eliminate reorder buffer area • Isolating channels from IP cores makes it transparent to software and other hardware High Performance, Area Optimized and Scalable Multichannel DRAM Subsystems for Video SOCs
Ideal Solution to Memory Problem for Video SoCs • Must be built on an architecture that provides predictability, guarantees QoS, leverages multithreading and explores concurrency • Automatic load balancing and channel management to provide scalable memory performance • This approach works for any number of channels of DRAM • Solution should be transparent to hardware and software • Decoupling of IP cores and software from the memory subsystem configuration SonicsSX w/IMT + MemMax memory scheduler Multichannel DRAM Subsystems for Video SOCs
Thank you!alex@sonicsinc.com Multichannel DRAM Subsystems for Video SOCs