260 likes | 544 Views
Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien Chen, Chuan-Yung Tsai, Yu-Wen Huang, and Liang-Gee Chen, Fellow, IEEE. Outline. Introduction Fundamentals and Problem Statement
E N D
Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien Chen, Chuan-Yung Tsai, Yu-Wen Huang, and Liang-Gee Chen, Fellow, IEEE
Outline • Introduction • Fundamentals and Problem Statement • Proposed Data Reuse Scheme • Proposed Framework for SRMC Scheme • Simulation Results and Performance Evaluation
Introduction • The H.264/AVC can save 25%-45% and 50%-70% of bitrates when compared with MPEG-4 advanced simple profile and MPEG-2, respectively, but higher computation and memory bandwidth. • The inter prediction occupies over 95% of the computational resource, which is mainly caused by multiple reference frames motion estimation (MRF-ME).
Introduction • We propose a new frame-level DR (data reuse) scheme. With the frame-level rescheduling, the data of one loaded SW (search window) can be reused by multiple current MBs in different original frames for MRF-ME, and the system bandwidth and local memory size is greatly reduced.
Fundamentals and Problem StatementA. Inter Prediction in H.264/AVC • For variable block size ME (VBS-ME), there are 41 different blocks within one MB and gives rise to a large number of possible combinations. • Lagrangian mode decisions : The Lagrangian cost function considers both the distortion and the rate parts. • Distortion : sum of absolute differences (SAD) • Rate : the number of bits required to code the reference frame number and the motion vectors (MVs).
Fundamentals and Problem StatementB. Conventional Data Reuse Scheme and Problem Statement • The bandwidth between system memory and ME core is very heavy if all required pixels are loaded from system memory. • A common solution is to design local buffers to store reusable data.
Fundamentals and Problem StatementB. Conventional Data Reuse Scheme and Problem Statement • When the ME of MB-a is finished, and the MB-b will be processed, only the reference pixels in D are loaded to replace A in the local memory.
Fundamentals and Problem StatementB. Conventional Data Reuse Scheme and Problem Statement • There are four SW memories, and each SW memory will be independently loaded and updated.
Fundamentals and Problem StatementB. Conventional Data Reuse Scheme and Problem Statement • The hardware cost is almost proportional to the maximum reference frame number. • The more system bandwidth requirement, the more power consumption. • The more memory size, the more silicon area and cost.
Proposed Data Reuse SchemeA. Frame-Level Data Reuse • In MRSC (multiple reference frames single current macroblock) scheme, one current MB is loaded only one time, and one reference SW is loaded several times. In SRMC (single reference frame multiple current macroblocks) scheme, one current MB is loaded several times while one reference SW is only loaded once. • Since the SW is much larger than one MB, both the bandwidth and memory size can be largely reduced.
Proposed Data Reuse SchemeB. Frame-Level Rescheduling • It is assumed that there are six MBs in each frame and five P-frames to be coded. The maximum reference frame number is four.
Proposed Data Reuse SchemeB. Frame-Level Rescheduling • The first, second, third and fourth ME cubes in one column represent the step-1, step-2, step-3, and step-4 searching processes in Fig.3.
Proposed Data Reuse SchemeB. Frame-Level Rescheduling • The first, second, third, and fourth ME cubes in one vertical column represents the step-1, step-2, step-3 , and step-4 searching processes in fig. 4.
Proposed Framework for SRMC SchemeA. Mode Decision for SRMC Scheme • The problem of inaccurate mode decision will occur after the block-level data reuse with the parallel hardware and frame-level rescheduling with the proposed SRMC scheme.
Proposed Framework for SRMC SchemeA. Mode Decision for SRMC Scheme • In the reference software (JM8.5), the Lagrangian cost function takes MV costs into consideration. • The MV of each block is generally predicted by the medium value of MVs from the left, top, and top-right neighboring blocks. • The exact MVPs of variable blocks are changed to the medium of MVs of the left, top, and top-right MBs in order to facilitate the parallel processing with block-level data reuse.
Proposed Framework for SRMC SchemeA. Mode Decision for SRMC Scheme
Proposed Framework for SRMC SchemeA. Mode Decision for SRMC Scheme • The mode decision flow is divided into partial mode decision (PMD) and final mode decision (FMD) as Table II. • The MVs and the distortion costs of these suboptimal results are written to the external memory. • After the PMD results of all reference frames are generated for a certain current MB, the FMD decides the best combination of variable blocks in different reference frames with system RISC.
Proposed Framework for SRMC SchemeB. Architecture Design • The ME core computes the candidates’ distortion value, and the PMD engine on-line decides the MVs of variable blocks according to the estimated MVPs. • The PMD results are buffered at system memory, and then the RISC performs FMD.
Proposed Framework for SRMC SchemeB. Architecture Design • The SW at the frame t-4 is loaded to SW buffer first. • Then, the ME task of the current MB in the frame t-3 will be performed. • After that, the FMD of this current MB is then done by RISC after the PMD results are written out. • At the same time, the current MBs at the same location of the following frames as t-2, t-1, t are processed one after another.
Simulation Results and Performance EvaluationA. Simulation Results • Four sequences: Foreman, Mobile, Akiyo, and Stefan. • The encoding parameters are Baseline profile, IPPP… structure, four reference frames, 16-pel search range, and low complexity mode decision.
Simulation Results and Performance EvaluationA. Simulation Results
Simulation Results and Performance EvaluationB. Performance Evaluation • MRSC: • SRMC: • The includes the MVs and the matching costs of variable blocks. So it is relatively small.
Simulation Results and Performance EvaluationB. Performance Evaluation • The increases with larger search range. • Therefore, the proposed SRMC scheme has better performance for the videos the larger frame sizes that inherently require larger search range.
Conclusion • By frame-level rescheduling, the procedures for multiple current MBs in different original frames can simultaneously utilize the data of single SW. • In the proposed framework, the SRMC scheme reduces not only 63% of external system bandwidth but also 75% of internal memory size for HDTV specifications.