Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility

Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility Jingming Xu Multimedia Communications Lab University of Waterloo

Outline • Introduction and motivation • MP3, AAC, and Two-nested-loop Search • Rate-distortion optimization for MP3 • Rate-distortion optimization for AAC • Conclusions and Future Research 2

Introduction • Audio coding - different from universal data compression • Long term correlations • Multi-channel correlations • Subject to natural noises • Subjective perceptual quality judgement • Audio coding methods - for both lossy and lossless • Linear prediction • Time-frequency mapping (DCT, FFT, MDCT, etc.) • Parameter coding • …. 3

Introduction (2) • MPEG - the most successful audio coding standard series so far • MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity • MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions • MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality • MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular “MP3” 4

Introduction (3) • Motivations • MP3 and AAC leave structured encoding blocks design open for performance enhancement. • The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard-constrained flexibility for best rate-distortion tradeoff. • The huge success of MP3 and AAC in the digital audio industry. 5

Introduction (4) • Quality evaluation of compressed audio • Most widely used objective measure - noise-to-mask ratio • Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116) • Triple sources A, B, C with hidden reference, double blind • 5-grade impairment score scale 6

MP3 and AAC audio coding standards • Encoding process • Window switching • Stereo coding • Pre-processing in AAC: gain control, prediction, noise shaping and substitution, etc. 7

MP3 and AAC audio coding standards (2) • Quantization and entropy coding in MP3 • Scale factor bands and non-uniform quantization • scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream 8

MP3 and AAC audio coding standards (3) • Quantization and entropy coding in MP3 • Huffman coding • 34 fixed Huffman codebooks • Huffman coding region division: Each region is coded with a different codebook that best matches the statistics of that region. big_value, count_1, zero, …. 9

MP3 and AAC audio coding standards (4) • Quantization and entropy coding in AAC • Non-uniform quantizer: same as in MP3 • scale_factor values are differentially encoded relatively to the one of the preceding band by fixed Huffman codebook • Huffman coding • 12 fixed Huffman codebooks • Huffman coding region division: Section boundaries can only be at the scale factor band boundaries • For each section, the length of the section in scale factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits. 10

Two-nested-loop Search algorithm Outer Loop Inner Loop 11

Two-nested-loop Search algorithm (2) • Problems in TNLS • Quantization, scale factor adaption and Huffman coding are considered separately. • Has no convergence guarantee • Does not target at minimizing the overall distortion • Disregards the inter-band correlations of scale factors and Huffman codebook selection in AAC 12

Rate-distortion optimization for MP3 • Problem formulation • Lagrangian RD cost minimization - quantized coefficients - scale factors - Huffman coding region division - Huffman codebook selection - non-uniform de-quantizer defined in MP3 - noise-to-mask ratio 13

Rate-distortion optimization for MP3 (2) • Problem formulation • Soft-decision quantization In conventional hard-decision quantization, is solely determined by given , i.e., . However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore, . 14

Rate-distortion optimization for MP3 (3) • Fixed-slope graph-based iterative RD optimization • Step 1: Initialize a set of scale factors from the given frame of spectrum with a HCB selection fashion . Set t=0, and specify a tolerance as the convergence criterion. • Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division fashion throughout a standard-constrained graph, where and achieve the minimum Denote by . 15

Rate-distortion optimization for MP3 (4) Graph Search for MP3 Quantized Spectrum and Region Division 16

Rate-distortion optimization for MP3 (5) • Fixed-slope graph-based iterative RD optimization • Step 3: Given , and , update to , so that achieves the minimum • Step 4: Given , and , update to , so that achieves the minimum • Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until , then output , , and . 17

Rate-distortion optimization for MP3 (6) • Simulation results: ANMR (implementation based on ISO MP3 reference codec) violin.wav spme50_1.wav 18

Rate-distortion optimization for MP3 (7) • Simulation results: ANMR (implementation based on LAME3.96.1 Best-quality mode) violin.wav spme50_1.wav 19

Rate-distortion optimization for MP3 (8) • Simulation results: ITU listening test (80kb/s) 20

Rate-distortion optimization for MP3 (9) • Remarks • The iteration process may only achieve local optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance. • The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS. • One can adaptively adjust the value of , to meet rate or distortion constraints in real audio compression applications. 21

Rate-distortion optimization for AAC • Problem formulation • Lagrangian RD cost minimization - scale factor sequence - Huffman codebook index sequence first-order inter-band dependency -> Dynamic programming (Viterbi algorithm) 22

Rate-distortion optimization for AAC (2) • Fixed-slope trellis-based RD optimization • Step 1: Build up trellis structure. For each state , = 0,1,…., -1, = 0,1,…., -1, = 0,1,…., -1, in the trellis, find the best to minimize its decomposed RD cost • Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm • Step 3: Backtrack the optimal , and as final output 23

Rate-distortion optimization for AAC (3) Trellis Structure for AAC Quantization and Entropy Coding 24

Rate-distortion optimization for AAC (4) • Simulation results: ANMR • Implementation based on ISO AAC reference codec • Also compared with Aggarwal’s approach (Steps 2, 3 only) violin.wav spme50_1.wav 25

Rate-distortion optimization for AAC (5) • Simulation results: ITU listening test (64kb/s) 26

Rate-distortion optimization for AAC (6) • Remarks • The fixed-slope trellis-based algorithm we proposed achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints. • Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame. 27

Conclusions and Future Research • Conclusions • Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm. • Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement. • Substantial performance improvement against the state-of-the-art encoders is achieved with complete decoder compatibility in each case. 28

Conclusions and Future Research (2) • Future research • Real-time implementations • Extension to scalable AAC • Joint pre-processing and optimization for AAC • Optimal lossy audio compression without syntax constraints • Optimal settings for transform (e.g. block lengths), quantization (e.g. stepsizes) and prediction • Joint design of quantization and entropy coding • …. 29

Questions?

Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility

Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility

Presentation Transcript

Rate Distortion Optimization for Mesh-based P2P Video Streaming

PAC/AAC audio coding standard

Adaptive Rate-Distortion Based Wyner-Ziv Video Coding

Audio Coding and Standards

An Overview of Perceptual Audio Coding and MPEG AAC

Audio Coding

AAC Optimization

Audio Coding

Distortion estimators for bitplane image coding

Embedded Software Optimization for MP3 Decoder Implemented on RISC Core

Rate-Distortion Optimized Motion Estimation for Error Resilient Video Coding

Perceptual Video Distortion Metrics and Coding

Audio Coding

Introduction of MPEG-2 AAC Audio Coding

Design and VLSI implementation of a digital audio-specific DSP core for MP3/AAC

Adaptive Rate-Distortion Based Wyner-Ziv Video Coding

Audio Coding

Speech and Audio Coding

H.264 Rate-Distortion Optimization and Rate Control

Rate Distortion Theory

AAC Advanced Audio Coding

Distortion-Rate for Non-Distributed and Distributed Estimation with WSNs