310 likes | 591 Views
Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility. Jingming Xu Multimedia Communications Lab University of Waterloo. Outline. Introduction and motivation MP3, AAC, and Two-nested-loop Search Rate-distortion optimization for MP3
E N D
Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility Jingming Xu Multimedia Communications Lab University of Waterloo
Outline • Introduction and motivation • MP3, AAC, and Two-nested-loop Search • Rate-distortion optimization for MP3 • Rate-distortion optimization for AAC • Conclusions and Future Research 2
Introduction • Audio coding - different from universal data compression • Long term correlations • Multi-channel correlations • Subject to natural noises • Subjective perceptual quality judgement • Audio coding methods - for both lossy and lossless • Linear prediction • Time-frequency mapping (DCT, FFT, MDCT, etc.) • Parameter coding • …. 3
Introduction (2) • MPEG - the most successful audio coding standard series so far • MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity • MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions • MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality • MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular “MP3” 4
Introduction (3) • Motivations • MP3 and AAC leave structured encoding blocks design open for performance enhancement. • The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard-constrained flexibility for best rate-distortion tradeoff. • The huge success of MP3 and AAC in the digital audio industry. 5
Introduction (4) • Quality evaluation of compressed audio • Most widely used objective measure - noise-to-mask ratio • Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116) • Triple sources A, B, C with hidden reference, double blind • 5-grade impairment score scale 6
MP3 and AAC audio coding standards • Encoding process • Window switching • Stereo coding • Pre-processing in AAC: gain control, prediction, noise shaping and substitution, etc. 7
MP3 and AAC audio coding standards (2) • Quantization and entropy coding in MP3 • Scale factor bands and non-uniform quantization • scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream 8
MP3 and AAC audio coding standards (3) • Quantization and entropy coding in MP3 • Huffman coding • 34 fixed Huffman codebooks • Huffman coding region division: Each region is coded with a different codebook that best matches the statistics of that region. big_value, count_1, zero, …. 9
MP3 and AAC audio coding standards (4) • Quantization and entropy coding in AAC • Non-uniform quantizer: same as in MP3 • scale_factor values are differentially encoded relatively to the one of the preceding band by fixed Huffman codebook • Huffman coding • 12 fixed Huffman codebooks • Huffman coding region division: Section boundaries can only be at the scale factor band boundaries • For each section, the length of the section in scale factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits. 10
Two-nested-loop Search algorithm Outer Loop Inner Loop 11
Two-nested-loop Search algorithm (2) • Problems in TNLS • Quantization, scale factor adaption and Huffman coding are considered separately. • Has no convergence guarantee • Does not target at minimizing the overall distortion • Disregards the inter-band correlations of scale factors and Huffman codebook selection in AAC 12
Rate-distortion optimization for MP3 • Problem formulation • Lagrangian RD cost minimization - quantized coefficients - scale factors - Huffman coding region division - Huffman codebook selection - non-uniform de-quantizer defined in MP3 - noise-to-mask ratio 13
Rate-distortion optimization for MP3 (2) • Problem formulation • Soft-decision quantization In conventional hard-decision quantization, is solely determined by given , i.e., . However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore, . 14
Rate-distortion optimization for MP3 (3) • Fixed-slope graph-based iterative RD optimization • Step 1: Initialize a set of scale factors from the given frame of spectrum with a HCB selection fashion . Set t=0, and specify a tolerance as the convergence criterion. • Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division fashion throughout a standard-constrained graph, where and achieve the minimum Denote by . 15
Rate-distortion optimization for MP3 (4) Graph Search for MP3 Quantized Spectrum and Region Division 16
Rate-distortion optimization for MP3 (5) • Fixed-slope graph-based iterative RD optimization • Step 3: Given , and , update to , so that achieves the minimum • Step 4: Given , and , update to , so that achieves the minimum • Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until , then output , , and . 17
Rate-distortion optimization for MP3 (6) • Simulation results: ANMR (implementation based on ISO MP3 reference codec) violin.wav spme50_1.wav 18
Rate-distortion optimization for MP3 (7) • Simulation results: ANMR (implementation based on LAME3.96.1 Best-quality mode) violin.wav spme50_1.wav 19
Rate-distortion optimization for MP3 (8) • Simulation results: ITU listening test (80kb/s) 20
Rate-distortion optimization for MP3 (9) • Remarks • The iteration process may only achieve local optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance. • The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS. • One can adaptively adjust the value of , to meet rate or distortion constraints in real audio compression applications. 21
Rate-distortion optimization for AAC • Problem formulation • Lagrangian RD cost minimization - scale factor sequence - Huffman codebook index sequence first-order inter-band dependency -> Dynamic programming (Viterbi algorithm) 22
Rate-distortion optimization for AAC (2) • Fixed-slope trellis-based RD optimization • Step 1: Build up trellis structure. For each state , = 0,1,…., -1, = 0,1,…., -1, = 0,1,…., -1, in the trellis, find the best to minimize its decomposed RD cost • Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm • Step 3: Backtrack the optimal , and as final output 23
Rate-distortion optimization for AAC (3) Trellis Structure for AAC Quantization and Entropy Coding 24
Rate-distortion optimization for AAC (4) • Simulation results: ANMR • Implementation based on ISO AAC reference codec • Also compared with Aggarwal’s approach (Steps 2, 3 only) violin.wav spme50_1.wav 25
Rate-distortion optimization for AAC (5) • Simulation results: ITU listening test (64kb/s) 26
Rate-distortion optimization for AAC (6) • Remarks • The fixed-slope trellis-based algorithm we proposed achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints. • Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame. 27
Conclusions and Future Research • Conclusions • Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm. • Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement. • Substantial performance improvement against the state-of-the-art encoders is achieved with complete decoder compatibility in each case. 28
Conclusions and Future Research (2) • Future research • Real-time implementations • Extension to scalable AAC • Joint pre-processing and optimization for AAC • Optimal lossy audio compression without syntax constraints • Optimal settings for transform (e.g. block lengths), quantization (e.g. stepsizes) and prediction • Joint design of quantization and entropy coding • …. 29