290 likes | 608 Views
H.264/AVC Baseline Profile Decoder Complexity Analysis. Michael Horowitz , Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, JULY 2003. Outline. Introduction H.264/AVC decoder overview Storage requirements Time complexity
E N D
H.264/AVC Baseline Profile DecoderComplexity Analysis MichaelHorowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, JULY 2003
Outline • Introduction • H.264/AVC decoder overview • Storage requirements • Time complexity • Comparative analysis • Experimental analysis • Conclusion
Introduction • To estimate the computational complexity of an H.264/AVC baseline decoder, it is important to understand its two major components: • Time complexity • Space complexity
Introduction • Time complexity • Time complexity is measured by the approximate number of operations required to execute a specific implementation of an algorithm. • Storage complexity • Storage complexity is measured by the approximate amount of memory required to implement an algorithm.
Introduction • Develop and validate a methodology for estimating decoder time complexity. • Study the relationship between decoder time complexity and encoder characteristics, source content, resolution and bit rate
H.264/AVC decoder overview • H.264/AVC decoding process consists of two primary paths: • the generation of the predicted video blocks • the decoding of the coded residual blocks
H.264/AVC decoder overview • The decoding first process includes the parsing and decoding of the entropy coded bitstream • UVLC ( Complexity of CAVLC = 2 x UVLC ) • Depending on the coding mode (I or P) of each macroblock, the predicted macroblock can be generated either temporally (intercoding) or spatially (intra-coding).
H.264/AVC decoder overview • Inter-coding MB: • Block size 16x16 ~ 4x4 • Quarter-sample accuracy • Motion vectors are coded differentially using either median or directional prediction. • Multiple reference frame
H.264/AVC decoder overview • Intra-coding MB • 16x16 or 4x4 Intra-coding mode • 9 possible mode for 4x4, 4 possible mode for 4x4 (ex : DC, vertical, horizontal …...) • Decoding of residual • Inverse transform • Deblocking filter
Storage requirements • The storage required by an H.264/AVC baseline decoder is divided into: • Memory that is needed for the whole frame • Memory that is needed for one line of macroblock • Memory that is needed for a macroblock • Memory that is needed for constant data
Storage requirements • Frame buffers dominate the storage requirements, particularly for high-resolution video • 95% for QCIF, 98% for CIF
Time complexity • Table Descriptions • Decoder Subfunction Tables • Operation Count Table • Execution Subunit Table
Time complexity • Analysis Methodology • First:compute the number of cycles required to execute a particular subfunction on a chosen hardware platform • Second:the cycle count estimate is derived by multiplying the result from the first step by the frequency with which the subfunctionwas used.
Time complexity • Example : 4x4 inverse transform and reconstruct on TRIMEDIA • Two case : • inverse transform and reconstruct • inverse transform only (no nonzero coefficient )
Time complexity • Case 1 : inverse transform and reconstruct
Time complexity • Case 2 : inverse transform
Time complexity • 4 x 4 inverse transforms are 42165.7 ( 28266.6 luminance and 13899.1 chrominance, Mobile QP=21 ) • 42165.7 x 39 = 164462.3 • 242954.3 x 16 = 3887268.8 • 164462.3 + 3887268.8 = 5531731.1
Comparative analysis • Measure the cycles of 4 x 4 inverse transform and reconstruct on P3 • Using the propose method 6.9 million cycles • Using VTune Performance Analyzer 28.23 million cycles • The ratio : 28.23 / 6.9 = 4.05
Comparative analysis • The ratio is due to : • Operation count table contains data for only fundamental operations . Overhead operations such as loop overhead, flow control, and boundary condition handling are not included. • The software is designed so that the overhead due to instruction cache misses is negligible, hardware register counts are not exceeded and operation latency is hidden.
Comparative analysis • Through an analysis summary for the P3 that the theoretical estimates are approximately 2~6 times lower than the experimental results. • Specific factor depends mainly on the characteristics of the subfunction, such as the regularity of operations, amount of overhead.
Experimental analysis • Our experimental analysis shows that the time complexity of the decoder and its major subfunctions are strongly dependent upon the average bit rate of the coded bitstream • Optimality of the motion estimation and mode decision processes in the source encoder don’t have a significant impact on decoder complexity.
Experimental analysis • One of the most important pieces of information in the complexity analysis is the distribution of time complexity amongst subsystems • Loop filtering 33% • interpolation 25% • Entropy decoding 13% • inverse transforms and reconstruction 13%
Experimental analysis • The factors that can affect the complexity of each of subfunctions • Inverse Quantization, Transforms, and Reconstruction • the number of blocks and macroblocks that contain nonzero coefficients. • Bitstream Parsing and Entropy Decoding • bit rate:More time is spent with coefficients at higher bit rates
Experimental analysis • Interpolation :
Experimental analysis • Loop filter • The most important factor in this variability is the percentage of edges that might be filtered due to the boundary strength determined for that edge.
Conclusions • Study the computational complexity of the H.264/AVC baseline decoder using both theoretical and experimental methods