660 likes | 944 Views
Low Complexity H.264 Encoder using Machine Learning. THEJASWINI PURUSHOTHAM Electrical Engineering Graduate Student The University of Texas at Arlington Advisor D r. K. R. Rao , EE Dept, UTA. Agenda. Introduction. H.264/AVC. Machine learning. C4.5. Weka . Thesis Approach. Results.
E N D
Low Complexity H.264 Encoder using Machine Learning. THEJASWINI PURUSHOTHAM Electrical Engineering Graduate Student The University of Texas at Arlington Advisor Dr. K. R. Rao, EE Dept, UTA 8 Septmeber 2010
Agenda Introduction. H.264/AVC. Machine learning. C4.5. Weka. Thesis Approach. Results. Conclusions. 8 Septmeber 2010
H.265/HEC / NGVC 2010 Coding Efficiency Network awareness Complexity VC-1 2005 SVC HDTV H.264 2003 Mobile TV MPEG4 1999 Hand PC Video Conferencing MPEG2 H.263 Mobile Phone 1994 1992 MPEG1 video compression and standardization • Need for standardization • Ensures interoperability • Importance of video • Need for compression • High bandwidth requirements • Remove inherent redundancy 8 Septmeber 2010
Motivation for the research 8 Septmeber 2010
Motivation for a low complexity H.264 encoder H.264 can achieve considerably higher coding efficiency than previous standards. Motion estimation, in-loop deblocking filter, sub-pel interpolationand mode decision bring in the complexity. The high-computational complexity of H.264 and real-time requirements of video systems are the main challenges. 8 Septmeber 2010
Overview of H.264/AVC 8 Septmeber 2010
Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • 9 intra 4x4 modes + 4 intra 16x16 modes + 9 intra 8x8 modes • Variable block-size motion compensation with small block size • 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 • Quarter-sample-accurate motion compensation • Multiple reference picture motion compensation • In-the-loop deblocking filtering to remove blocky artifacts • Features for improved coding efficiency • Small block-size transform – 4x4 and 8x8 integer DCT • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010
H.264 - Encoder 8 Septmeber 2010
H.264 Decoder 8 Septmeber 2010
H.264 Decoder 8 Septmeber 2010
Overview of machine learning 8 Septmeber 2010
Machine learning is a subfield of artificial intelligence. It is the subject concerned with the design and development of algorithms and techniques that allow computers to learn. Machine learning method in this thesis extracts rules and patterns out of massive data sets. The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. 8 Septmeber 2010
C4.5 classifier 8 Septmeber 2010
C4.5 was developed by Ross Quinlan. C4.5 (know as a J48) is a system that constructs classifiers. Classifiers are one of the commonly used tools in data mining. Such systems take as input a collection of cases, each belonging to one of a small number of classes and described by its values for a fixed set of attributes. With that, a classifier accurately predicts the class to which a new case belongs. C4.5 uses the information gain of the data attribute to sort the data. 8 Septmeber 2010
Illustration of C4.5 classification 8 Septmeber 2010
Decision tree 8 Septmeber 2010
WEKA Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from another Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes [25]. 8 Septmeber 2010
Complexity in the h.264 encoder 8 Septmeber 2010
Figure 1: Multi-frame Motion Estimation. 8 Septmeber 2010
The most computational expensive process in H.264 is the Motion Estimation. For example, assuming FS and P block types, Q reference frames and a search range of MxN, MxNxPxQcomputions are needed. 8 Septmeber 2010
Approach in this thesis 8 Septmeber 2010
Approach J4.8 analysis is used to reduce the complexity of determining mode decisions. The statistics for each 16x16 macroblock of the first four frames of the video sequence is calculated. The statistics are the mean, variance, variance of means for all the sub macroblock sizes in the macroblock, mean of the adjacent macroblocks, variance of the adjacent macroblocks and variance of means for all the submacroblock sizes in the adjacent blocks. 8 Septmeber 2010
Figure 2:Flow chart of the process followed to achieve the low complexity encoder. 8 Septmeber 2010
The modes for the same first four frames from the video sequences are determined from the H.264 encoder in the JM 16.2 software. These modes and the determined statistics are collectively given as attributes for training in the WEKA tool. This is an offline process. WEKA tool uses C4.5 (J48) classifier algorithm to determine the mode decision tree. A universal tree that can give relatively accurate mode decisions to any video sequence is developed. 8 Septmeber 2010
Different combination of video sequences are used for training the mode decision trees and later testing the mode decision trees. Table 1 summarizes the results. The attributes most commonly considered for mode decision in all the entries in the table are considered to determine the mode decision for the universal mode decision tree. This tree is implemented in the form of if – else statements in the motion estimation block of JM16.2. Hence, the mode decision process is reduced to if –else statements. 8 Septmeber 2010
Attributes in the thesis The metrics used in the decision trees are the mean, variance, variance of means, residual absolute sum, residual mean, residual variance, residual variance of means and means of variance. These metrics were calculated for the main MB shapes 16x16, 8x8 and 4x4. 8 Septmeber 2010
Decision Tree for mode decision 8 Septmeber 2010
Table 1:Classification rule accuracy 8 Septmeber 2010
Table 1 summarizes the WEKA tool results. The accuracy in determining the modes from the classification rule is summarized. 8 Septmeber 2010
Table 2: Results obtained using JM 16.2 and JM using machine learning for 4 frames. 8 Septmeber 2010
Table 3: Speed up in encoding time and motion estimation time for 4 frames using machine learning compared to JM 16.2 encoder. 8 Septmeber 2010
Motion estimation time for 4 frames for sequences in Table 3. 8 Septmeber 2010
Table 4: Comparison of compressed file sizes for four frames for sequences in Table 2. 8 Septmeber 2010
Compressed file sizes using machine learning for four frames for sequences in Table 4. 8 Septmeber 2010
Table 5: Comparison of PSNR and MSE for four frames. 8 Septmeber 2010
Comparison of PSNR and MSE for four frames in Table 5. 8 Septmeber 2010
Table 6: SSIM comparison for four frames. 8 Septmeber 2010
Comparison of SSIM for four frames in Table 6. 8 Septmeber 2010
CONCLUSIONS It was observed that a single universal mode decision tree failed in terms of fidelity of the video when all the modes for ME/MC were used in the machine learning algorithm. So this thesis uses only sub macroblock modes, i.e 8x8, 8x4, 4x8 and 4x4 modes for the machine learning. The function called ‘submacroblock_mode_decision’ in the JM 16.2 was replaced by the if-else statements . The results are tabulated in the Tables 7 through 11. From Table 8, it is clear that the average speed up in the encoding time is 28.5%. The average speed up in the motion estimation time is 42.846%. From table 9, the average percentage decrease in compressed file size is 0.36%. From Table 11, it is evident that the average decrease in SSIM is less than 0.0107%. When 100 frames are encoded the average speed up in the encoding time is 8.5%. The average speed up in the motion estimation time is 18.346% and the average decrease in SSIM is less than 0.0109%. 8 Septmeber 2010
REFERENCES [1] http://iphome.hhi.de/suehring/tml/ for JM software [2] Soon-kak Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006. [3] http://www.vcodex.com/files/h264_overview_orig.pdf reference for H.264 [4] http://iphome.hhi.de/suehring/tml/JM%20 Reference%20Software%20Manual%20(JVT- AE010).pdf for JM reference software documentation manual [5] G. A. Davidson, et al “ATSC video and audio coding”, Proceedings of IEEE, vol. 94, pp. 60- 76, Jan. 2006 [6] http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for information about CIF and QCIF formats [7] M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz University of Technology, June 2004 [8] A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct. 2004. [9] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560-576, July 2003. 8 Septmeber 2010
[10] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007. [11] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006. [12] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBUTechnical Review, Jan. 2003. [13] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [14] Z. Wang et al, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. [15] Z. Wang, L. Lu, and A.C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal Processing: Image Communication, Special Issue on Objective Video Quality Metrics, vol. 19,pp. 122-124, Jan. 2004. [16] Z. Wang, H.R. Sheikh, and A.C. Bovik, “Objective video quality assessment,” in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), pp. 1041–1078, CRC Press, Sept. 2003. [17] T.K. Tan, G. Sullivan and T. Wedi, “Recommended simulation conditions for coding efficiency experiments”, ITU-T SC16/Q6, 34th VCEG Meeting, Antalya, Turkey, Jan. 2008, Doc.VCEG-AH10r3. [18] P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009. 8 Septmeber 2010
[19] G.Sullivan and T.Wiegand, “Video compression – From concepts to the H.264/AVC Standard,” Proc. IEEE, vol.93, pp. 18-31, Jan.2005. [20] http://www.apple.com/quicktime/technologies/h264/ for H.264 codec reference [21] D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments Developer Conference India, 30 Nov. – 1 Dec. 2005, Bangalore. [22] http://wiki.multimedia.cx/index.php?title=Motion_Prediction for motion prediction [23] Zhi-Yi Mai, et al “A new-rate distortion optimization using structural information in H.264 I-frame encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005 [24] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures on Image, Video and Multimedia Processing. Morgan and Claypool, 2006. [25] http://www.cs.waikato.ac.nz/ml/weka/ for WEKA tool download [26]I.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley , 2006. [27]I.E.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley, II edition, 2010. [28] HTTP://iphome.hhi.de/suehring/tml/download/ , JM reference software. [29] http://trace.eas.asu.edu/yuv/index.html, Video sequences. [30] E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Mobile video communications using a Wyner-Zivtranscoder,” Proc. SPIE 6822, VCIP, 68220R Jan. 2008. [31] A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” Proc. International Picture Coding Symposium, Beijing, P. R. China , April 2006. 8 Septmeber 2010
THANK YOU 8 Septmeber 2010
H.264 - Profiles 8 Septmeber 2010
Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • Variable block-size motion compensation with small block size • Quarter-sample-accurate motion compensation • Motion vectors over picture boundaries • Multiple reference picture motion compensation • Decoupling of referencing order from display order • Decoupling of picture representation methods from picture referencing capability • Weighted prediction • Improved “skipped” and “direct” motion inference • In-the-loop deblocking filtering 8 Septmeber 2010
Features for improved coding efficiency • Small block-size transform • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010
Features for robustness to data errors/losses • Parameter set structure • NAL unit syntax structure • Flexible slice size • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Redundant pictures • Data Partitioning • SP/SI synchronization/switching pictures 8 Septmeber 2010
Directional spatial prediction for intra coding Intra prediction is to predict the texture in current block using the pixel samples from neighboring blocks Intra prediction for 44 (9 modes) and 16 16 blocks (4 modes) are supported in all H.264 profiles. Intra prediction for 8x8 (9 modes) is supported in the high profiles. 8 Septmeber 2010
Luma prediction modes in H.264 8 Septmeber 2010
Variable block-size motion compensation • Partitioned in 2 stages • In the 1st stage, determine first 4 modes • 1616 • 168 • 816 • 88 • If mode 4 (88) is chosen, further partition into smaller blocks for every 88 block • 84 • 48 • 44 • At most 16 motion vectors may be transmitted for a 1616 macroblock • Sub pixel accuracy • Large computational complexity to determine the modes but efficient encoding 8 Septmeber 2010