200 likes | 395 Views
Design of a 125 W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications. Tsu-Ming Liu 1 , Ching-Che Chung 1 , Chen-Yi Lee 1 , Ting-An Lin 2 , and Sheng-Zen Wang 2 1 National Chiao-Tung University, Hsin-Chu, Taiwan 2 MediaTek Inc. Hsin-Chu, Taiwan 2006/7/26. Outline.
E N D
Design of a 125W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu1, Ching-Che Chung1, Chen-Yi Lee1, Ting-An Lin2, and Sheng-Zen Wang2 1National Chiao-Tung University, Hsin-Chu, Taiwan2MediaTek Inc. Hsin-Chu, Taiwan 2006/7/26
Outline • Introduction • System Specification • Improved Memory Hierarchy • Low-Power Architectures • Design Flow • Measured Results • Conclusion
Motivation • Low power demands • The power consumption of existing solutions is still not applicable for portable devices. • A memory system becomes a critical factor in power budgets. • High speed requirements • H.264/AVC requires high-speed modules to accomplish the extensive accesses between the memory and logic. Misc. 30% H.264/AVCCore Power Profiling SRAM 70%
Design Contributions • To reduce power consumption • We exploit the memory hierarchy to reduce memory power consumption. • We develop low-power architectures to lower the working frequency with only a few additional buffers and an additional logic unit. • In addition to the power reduction through architectural levels, an efficient design flow can further reduce the power dissipation.
Target Specification • Dual Standard • H.264/AVC Baseline Profile, Level 4 • MPEG-2 Simple Profile, Main Level • High Quality Decoding (30fps,4:2:0)
System Block Diagram System BUS Syntax Parser SDRAM I/F 8MB SDRAM Intra, Inter Prediction + Display Engine Display I/F In/Post- Loop Filter 4x4/8x8 IDCT Entropy Decoder Slice Pixel SRAM Line-Pixel-Lookahead
Improved Memory Hierarchy • Proposed three-level memory hierarchy SDRAM 24 3rdLevel I/O Interface 16 request i SliceSRAM LPL Unit 2nd Level 32-b bypass Slice SRAM stores rows of pixels Pipeline Register IntraPred. MotionComp. 1st Level …..
Improved Memory Hierarchy • Line-Pixel-Lookahead (LPL) Unit • We exploit an LPL unit to eliminate redundant data and thereby reduce memory space. SliceSRAM (153.6kb) SliceSRAM (19.2kb) LPL Unit w/o LPL unit w/tLPL unit Horizontal Horizontal-Up
Improved Memory Hierarchy SRAM Power • Memory Power Consumption DRAM Power mW 60 44% 51% 40 11% Memory Power Consumption 20 w/o MemoryHierarchy 3-level Memory Hierarchy 3-level Memory Hierarchy+ LPL Scheme
Low-Power Architectures • Motion Compensation (MC) • We utilize the data reuse of interpolation window by allocating content buffers. 4x4sub-block 0 1 4 5 SDRAM 2 3 6 7 1% cost of MC 6x9 content buffers 0 1 4 5 2 3 6 7 0 1 4 5 2 3 6 7
Low-Power Architectures • Deblocking Filter (DF) • We reduce the access overhead of different filtering directions by developing novel filtering orders. SRAM 17 18 19 20 1 5 1 5 9 13 17 21 22 23 24 2 6 10 14 21 50% accessreduction!! SRAM 13 15 5 9 5 1 3 7 11 4x4sub-block 1 3 6 10 14 16 6 2 4 12 8
Low-Power Architectures • A lower working frequency is sufficient to meet our design specification. Improved MC 920cycles/MB Improved DF 580cycles/MB 380cycles/MB Pipelined Stage 242MHz 152MHz 100MHz Preliminary This Work
Design Flow Phase 2 • A design flow for this video decoder Phase 1 Design Loop Timing/SI Closure Loop ArchitecturalDesign Synthesis P&R System SPEC C/C++ Model RTL Description RTL Compiler SoC Encounter Further 8.2% power reduction 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 73% power reduction 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f )
Measured Results 3.9 mm 3.9 mm
Measured Results • Chip Summary
Measured Results • Power Measurement
Measured Results • Power Measurement • Measured accuracy: • Voltage scaling Max. working freq. (MHz) H.264 Core Power (W) 225W 112MHz QCIF@15fps 1.15MHz 31MHz 125W 1.8 1.6 1.4 1.2 1.0 (V) 1.8 1.6 1.4 1.2 1.0 (V)
Conclusion • A MPEG-2 SP@ML and H.264/AVC BL@L4 video decoder is developed for dual standard requirements. • The tremendous saving in power consumption is attained through both improved memory hierarchy and low-power architectures, and this power can be further reduced through EDA tools. • Sub-mW power consumption can be achieved when real-time decoding MPEG-2 or H.264/AVC video sequences for mobile applications at 1V operating voltage.