260 likes | 395 Views
Algorithm and Architecture Design of Power-Oriented H.264/AVC Baseline Profile Encoder for Portable Devices. Yu-Han Chen, Tung- Chien Chen, Chuan-Yung Tsai, Sung-Fang Tsai, and Liang-Gee Chen, Fellow, IEEE. IEEE CSVT 2009. Outline. Introduction Integer motion estimation
E N D
Algorithm and Architecture Design ofPower-Oriented H.264/AVC Baseline Profile Encoder for Portable Devices Yu-Han Chen, Tung-Chien Chen, Chuan-Yung Tsai, Sung-Fang Tsai, and Liang-Gee Chen, Fellow, IEEE IEEE CSVT 2009
Outline Introduction Integer motion estimation Fractional motion estimation Parameterized power-scalable encoding system Flexible system architecture Implementation results Conclusion
Introduction Battery capacity Power-aware encoder • Power-aware encoder can adjust power consumption in response to different conditions. ex: user’s preferences and battery states. Lifetime
Introduction • In this paper provide multiple operating configurations between point C and D and thus can adapt to different environmental conditions. Power-aware encoder
Integer motion estimation • Integrates the low-power design techniques at the algorithm level and the architecture level. • Hardware-oriented fast algorithm • Improve data reuse capability. • Content-aware algorithm • Achieve good tradeoff between coding performance and computation complexity.
Hardware-oriented fast algorithm • Parallel-VBS-IME algorithm • Computes all matching costs of different block-sizes with the same MVs simultaneously. • Intra-candidate data reuse • Computes 4x4blocks first , larger block sizes are calculated by summing up the corresponding 4x4 costs immediately. • Inter-candidate data reuse • For two horizontally neighboring candidates of a 16×16 block, 16×15 reference pixels are overlapped and can be shared.
Hardware-oriented fast algorithm • Parallel-VBS-FSS • Good for inter-candidate data reuse. • Parallel-VBS-IME is adopted. Locally best is at center Move to locally best
Content-aware algorithm • If motion activity is high • Set more initial candidates to find the accurate MVs. • Multi-iteration parallel-VBS-FSS algorithm Predicted motion window (PMW) 6 initial candidates Search window
Content-aware algorithm • Six initial candidates • (0,0) • MV predictor • Median MV of left, up, and up-right blocks. • Rest of four are used to find good matching in complex motion region.
Content-aware algorithm • Content-adaptive strategy • The PMW will be adaptively shrunk according to the neighboring motion activity.
Parallel architecture and memory organization • The searching candidate will conditionally move vertically or horizontally. Flexible memory access to support efficient data reuse. A2-D2 A2-D2 or B0-B3 Rotate right one Rotate right two Rotate right three
Parallel architecture and memory organization 1. Reference and current frame 2. Reference MBs Two-directional random access Inter data reuse 2. Current MB 3. 16x16 4. Compute the absolute difference values Intra data reuse 5. Compute SAD
Fractional motion estimation • Advanced mode pre-decision algorithm • N best modes (N = 0 − 7) are pre-decided after IME with integer-pixel precision. • Only the N best modes are refined to quarter-pixel precision. • Reduce computation. • Hardware-oriented one-pass algorithm • The half-pixel and quarter-pixel candidates are processed simultaneously to share the memory access data and reduce 50% memory access.
Fractional motion estimation • Hardware-oriented one-pass algorithm Quarter-pixel Half-pixel Integer-pixel Two-step algorithm:17 One-pass algorithm:25
Fractional motion estimation • Q is a 4 × 4 block of a quarter-pixel candidate and it is bilinearly interpolated from two 4 × 4 blocks (A and B) of half-pixel candidates. • Data processing power for HT of all quarter-pixel candidates is saved.
Fractional motion estimation Drop 0.06dB Same memory access
Fractional motion estimation • Parallel Architecture Generate the half-pixel reference data from integer-pixel reference data Generated the quarter-pixel reference data from half-pixel reference data
Parameterized power-scalable encoding system • Power-scalable parameters • IME, FME, intra prediction (IP), and DeBlocking(DB) engines. • Flexibly control the power consumption of the whole encoding system.
Parameterized power-scalable encoding system (1) 4 (2) 4 (3) 2+2 (4) 2 Power modes: 4*4*4*2=128
Implementation results • The curve shows the best coding performance with the highest power consumption. • 2.69% bit rate increase and 0.12 dB quality drop in average.
Implementation results Two reference frames to 1 reference frame. Huang’s H.264/AVC encoder Multi-iteration IME and FME Power scalability of IP and DB. Lin’s low-power MPEG-4 encoder
Conclusion A low-power and power-aware H.264/AVC video encoder has been proposed. The power efficiency was co-optimized at the algorithm, architecture, and circuit levels. Provide competitive power efficiency under D1 (720×480) 30 frames/s video encoding and the best power configurations compared to the previous state-of-the-art designs.