Low Complexity H.264 Encoder using Machine Learning.

Low Complexity H.264 Encoder using Machine Learning. THEJASWINI PURUSHOTHAM Electrical Engineering Graduate Student The University of Texas at Arlington Advisor Dr. K. R. Rao, EE Dept, UTA 8 Septmeber 2010

Agenda Introduction. H.264/AVC. Machine learning. C4.5. Weka. Thesis Approach. Results. Conclusions. 8 Septmeber 2010

H.265/HEC / NGVC 2010 Coding Efficiency Network awareness Complexity VC-1 2005 SVC HDTV H.264 2003 Mobile TV MPEG4 1999 Hand PC Video Conferencing MPEG2 H.263 Mobile Phone 1994 1992 MPEG1 video compression and standardization • Need for standardization • Ensures interoperability • Importance of video • Need for compression • High bandwidth requirements • Remove inherent redundancy 8 Septmeber 2010

Motivation for the research 8 Septmeber 2010

Motivation for a low complexity H.264 encoder H.264 can achieve considerably higher coding efficiency than previous standards. Motion estimation, in-loop deblocking filter, sub-pel interpolationand mode decision bring in the complexity. The high-computational complexity of H.264 and real-time requirements of video systems are the main challenges. 8 Septmeber 2010

Overview of H.264/AVC 8 Septmeber 2010

Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • 9 intra 4x4 modes + 4 intra 16x16 modes + 9 intra 8x8 modes • Variable block-size motion compensation with small block size • 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 • Quarter-sample-accurate motion compensation • Multiple reference picture motion compensation • In-the-loop deblocking filtering to remove blocky artifacts • Features for improved coding efficiency • Small block-size transform – 4x4 and 8x8 integer DCT • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010

H.264 - Encoder 8 Septmeber 2010

H.264 Decoder 8 Septmeber 2010

Overview of machine learning 8 Septmeber 2010

Machine learning is a subfield of artificial intelligence. It is the subject concerned with the design and development of algorithms and techniques that allow computers to learn. Machine learning method in this thesis extracts rules and patterns out of massive data sets. The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. 8 Septmeber 2010

C4.5 classifier 8 Septmeber 2010

C4.5 was developed by Ross Quinlan. C4.5 (know as a J48) is a system that constructs classifiers. Classifiers are one of the commonly used tools in data mining. Such systems take as input a collection of cases, each belonging to one of a small number of classes and described by its values for a fixed set of attributes. With that, a classifier accurately predicts the class to which a new case belongs. C4.5 uses the information gain of the data attribute to sort the data. 8 Septmeber 2010

Illustration of C4.5 classification 8 Septmeber 2010

Decision tree 8 Septmeber 2010

WEKA Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from another Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes [25]. 8 Septmeber 2010

Complexity in the h.264 encoder 8 Septmeber 2010

Figure 1: Multi-frame Motion Estimation. 8 Septmeber 2010

The most computational expensive process in H.264 is the Motion Estimation. For example, assuming FS and P block types, Q reference frames and a search range of MxN, MxNxPxQcomputions are needed. 8 Septmeber 2010

Approach in this thesis 8 Septmeber 2010

Approach J4.8 analysis is used to reduce the complexity of determining mode decisions. The statistics for each 16x16 macroblock of the first four frames of the video sequence is calculated. The statistics are the mean, variance, variance of means for all the sub macroblock sizes in the macroblock, mean of the adjacent macroblocks, variance of the adjacent macroblocks and variance of means for all the submacroblock sizes in the adjacent blocks. 8 Septmeber 2010

Figure 2:Flow chart of the process followed to achieve the low complexity encoder. 8 Septmeber 2010

The modes for the same first four frames from the video sequences are determined from the H.264 encoder in the JM 16.2 software. These modes and the determined statistics are collectively given as attributes for training in the WEKA tool. This is an offline process. WEKA tool uses C4.5 (J48) classifier algorithm to determine the mode decision tree. A universal tree that can give relatively accurate mode decisions to any video sequence is developed. 8 Septmeber 2010

Different combination of video sequences are used for training the mode decision trees and later testing the mode decision trees. Table 1 summarizes the results. The attributes most commonly considered for mode decision in all the entries in the table are considered to determine the mode decision for the universal mode decision tree. This tree is implemented in the form of if – else statements in the motion estimation block of JM16.2. Hence, the mode decision process is reduced to if –else statements. 8 Septmeber 2010

Attributes in the thesis The metrics used in the decision trees are the mean, variance, variance of means, residual absolute sum, residual mean, residual variance, residual variance of means and means of variance. These metrics were calculated for the main MB shapes 16x16, 8x8 and 4x4. 8 Septmeber 2010

Decision Tree for mode decision 8 Septmeber 2010

Table 1:Classification rule accuracy 8 Septmeber 2010

Table 1 summarizes the WEKA tool results. The accuracy in determining the modes from the classification rule is summarized. 8 Septmeber 2010

Table 2: Results obtained using JM 16.2 and JM using machine learning for 4 frames. 8 Septmeber 2010

Table 3: Speed up in encoding time and motion estimation time for 4 frames using machine learning compared to JM 16.2 encoder. 8 Septmeber 2010

Motion estimation time for 4 frames for sequences in Table 3. 8 Septmeber 2010

Table 4: Comparison of compressed file sizes for four frames for sequences in Table 2. 8 Septmeber 2010

Compressed file sizes using machine learning for four frames for sequences in Table 4. 8 Septmeber 2010

Table 5: Comparison of PSNR and MSE for four frames. 8 Septmeber 2010

Comparison of PSNR and MSE for four frames in Table 5. 8 Septmeber 2010

Table 6: SSIM comparison for four frames. 8 Septmeber 2010

Comparison of SSIM for four frames in Table 6. 8 Septmeber 2010

CONCLUSIONS It was observed that a single universal mode decision tree failed in terms of fidelity of the video when all the modes for ME/MC were used in the machine learning algorithm. So this thesis uses only sub macroblock modes, i.e 8x8, 8x4, 4x8 and 4x4 modes for the machine learning. The function called ‘submacroblock_mode_decision’ in the JM 16.2 was replaced by the if-else statements . The results are tabulated in the Tables 7 through 11. From Table 8, it is clear that the average speed up in the encoding time is 28.5%. The average speed up in the motion estimation time is 42.846%. From table 9, the average percentage decrease in compressed file size is 0.36%. From Table 11, it is evident that the average decrease in SSIM is less than 0.0107%. When 100 frames are encoded the average speed up in the encoding time is 8.5%. The average speed up in the motion estimation time is 18.346% and the average decrease in SSIM is less than 0.0109%. 8 Septmeber 2010

REFERENCES [1] http://iphome.hhi.de/suehring/tml/ for JM software [2] Soon-kak Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006. [3] http://www.vcodex.com/files/h264_overview_orig.pdf reference for H.264 [4] http://iphome.hhi.de/suehring/tml/JM%20 Reference%20Software%20Manual%20(JVT- AE010).pdf for JM reference software documentation manual [5] G. A. Davidson, et al “ATSC video and audio coding”, Proceedings of IEEE, vol. 94, pp. 60- 76, Jan. 2006 [6] http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for information about CIF and QCIF formats [7] M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz University of Technology, June 2004 [8] A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct. 2004. [9] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560-576, July 2003. 8 Septmeber 2010

[10] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007. [11] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006. [12] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBUTechnical Review, Jan. 2003. [13] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [14] Z. Wang et al, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. [15] Z. Wang, L. Lu, and A.C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal Processing: Image Communication, Special Issue on Objective Video Quality Metrics, vol. 19,pp. 122-124, Jan. 2004. [16] Z. Wang, H.R. Sheikh, and A.C. Bovik, “Objective video quality assessment,” in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), pp. 1041–1078, CRC Press, Sept. 2003. [17] T.K. Tan, G. Sullivan and T. Wedi, “Recommended simulation conditions for coding efficiency experiments”, ITU-T SC16/Q6, 34th VCEG Meeting, Antalya, Turkey, Jan. 2008, Doc.VCEG-AH10r3. [18] P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009. 8 Septmeber 2010

[19] G.Sullivan and T.Wiegand, “Video compression – From concepts to the H.264/AVC Standard,” Proc. IEEE, vol.93, pp. 18-31, Jan.2005. [20] http://www.apple.com/quicktime/technologies/h264/ for H.264 codec reference [21] D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments Developer Conference India, 30 Nov. – 1 Dec. 2005, Bangalore. [22] http://wiki.multimedia.cx/index.php?title=Motion_Prediction for motion prediction [23] Zhi-Yi Mai, et al “A new-rate distortion optimization using structural information in H.264 I-frame encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005 [24] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures on Image, Video and Multimedia Processing. Morgan and Claypool, 2006. [25] http://www.cs.waikato.ac.nz/ml/weka/ for WEKA tool download [26]I.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley , 2006. [27]I.E.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley, II edition, 2010. [28] HTTP://iphome.hhi.de/suehring/tml/download/ , JM reference software. [29] http://trace.eas.asu.edu/yuv/index.html, Video sequences. [30] E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Mobile video communications using a Wyner-Zivtranscoder,” Proc. SPIE 6822, VCIP, 68220R Jan. 2008. [31] A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” Proc. International Picture Coding Symposium, Beijing, P. R. China , April 2006. 8 Septmeber 2010

THANK YOU 8 Septmeber 2010

H.264 - Profiles 8 Septmeber 2010

Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • Variable block-size motion compensation with small block size • Quarter-sample-accurate motion compensation • Motion vectors over picture boundaries • Multiple reference picture motion compensation • Decoupling of referencing order from display order • Decoupling of picture representation methods from picture referencing capability • Weighted prediction • Improved “skipped” and “direct” motion inference • In-the-loop deblocking filtering 8 Septmeber 2010

Features for improved coding efficiency • Small block-size transform • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010

Features for robustness to data errors/losses • Parameter set structure • NAL unit syntax structure • Flexible slice size • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Redundant pictures • Data Partitioning • SP/SI synchronization/switching pictures 8 Septmeber 2010

Directional spatial prediction for intra coding Intra prediction is to predict the texture in current block using the pixel samples from neighboring blocks Intra prediction for 44 (9 modes) and 16  16 blocks (4 modes) are supported in all H.264 profiles. Intra prediction for 8x8 (9 modes) is supported in the high profiles. 8 Septmeber 2010

Luma prediction modes in H.264 8 Septmeber 2010

Variable block-size motion compensation • Partitioned in 2 stages • In the 1st stage, determine first 4 modes • 1616 • 168 • 816 • 88 • If mode 4 (88) is chosen, further partition into smaller blocks for every 88 block • 84 • 48 • 44 • At most 16 motion vectors may be transmitted for a 1616 macroblock • Sub pixel accuracy • Large computational complexity to determine the modes but efficient encoding 8 Septmeber 2010

Low Complexity H.264 Encoder using Machine Learning.