750 likes | 959 Views
Embedded Audio Coder. Jin Li. Outline. Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion. Introduction. Introduction – Audio Compression. Audio Waveform.
E N D
Embedded Audio Coder Jin Li
Outline • Introduction • Embedded audio coder - Algorithm • MLT with window switching • Quantizer • Entropy coder • Bitstream assembly • Modular software design • Experimental results & demos • Conclusion
Introduction – Audio Compression Audio Waveform . . . Bitstream
EAC vs. Other Compression • Existing audio compression schemes • MP3, AAC, MPEG4 audio, WMA, Real Audio, … • Why research for a new audio codec?
Media vs. File Compression • File compression • Every bit is important, has to be compressed losslessly • Media compression • Exact bit/value is not important, distortion is tolerable • Amount of media is huge, high compression ratio is required • Media needs adaptation
Key Features of EAC • Not only good compression performance • But also flexible bitstream syntax • The compressed bitstream may be manipulated for • Different bitrate • Different # of audio channels • Different audio sampling rate • Versatile • Lossless • Low delay • Streaming/storage application
EAC Encoder . . . Master Bitstream Encoder Companion File
Parser . . . . . . Master Bitstream Parser Application Bitstream Companion File • Except header, application bitstream is a subset of the master bitstream (parsing is fast) • May be changed according to the required bitrate, # of audio channels, and audio sampling rate
EAC Decoder .wav file Encoder . . . Bitstream Speaker (Direct Sound)
. . . Frame Work - Encoder Audio Transform Entropy coder Bitstream Assembly L+R(or mono) Transform Entropy coder Bitstream Assembly L-R Bitstream
Audio Transform • Input: audio sample • Output: transform coefficient • Goal: convert audio from space domain to frequency domain • Compact energy • Better match with psychoacoustic characteristics • Enable audio sampling rate change
Lossy vs Lossless Mode Audio Quantization MLT(SW) Lossy mode Audio Reversible MLT(SW) Lossless mode
MLT - Modulated Lapped Transforms Frequency Domain Spatial Response
MLT with Window Switching • Features • Basic window size 2048 • Short window size 256 • Switching criterion • A frame (2048 samples) is switched to short window if and only if • Energy is bigger than a certain threshold • Energy within the 8 subframes (256 samples) differs more than Ta • There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by Tb
Band Separation Audio (44.1kHz sampling) MLT with window switching 0.5p p 0 0.125p 0.25p Band separation
Synthesis (Half Sampling) 0.5p 0 0.125p 0.25p MLT with window switching Audio (22.05kHz sampling) Band separation
Synthesis (Quarter Sampling) 0 0.125p 0.25p MLT with window switching Audio (11.025kHz sampling) Band separation
Quantizer • Input: coefficient • Output: quantized coefficient • Goal: convert coefficient from float to integer • Reduce signal levels • Fast implementation of entropy coding
Quantizer • Scalar quantizer with a deadzone d 0 Quantized Magnitude Sign
Key to Achieve Lossless • Break the MLT into small steps • Make every step reversible • Definition of reversible transform • Integer input, integer output • The transform should have a determinant of 1 (donot expand data volume)
MLT Framework Post Rotation Window Pre-Rotate Complex FFT DCT IV Lapped Transform Forward MLT Inverse MLT Post Rotation-l Inv Window-l Pre-Rotate-l Complex FFT-l
Window Operation x(-n-1) x(n) Complex Rotate
Pre-Rotation xw(0) xw(1) xw(2) xw(3) xw(4) xw(5) xw(6) xw(7) Complex Rotate –/32 xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) Complex Rotate –5/32 Complex Rotate –9/32 Complex Rotate –13/32
FFT (4 Point Complex) yp(0) yp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) yc(0) yc(1) yc(2) yc(3) xc(0) xc(1) xc(2) xc(3) - - - - e-j/2
Post-Rotation Conjugate Rotate –0 y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) yp(0) yp(1) yp(2) yp(3) yp(4) yp(5) yp(6) yp(7) Conjugate Rotate –/8 Conjugate Rotate –2/8 Conjugate Rotate –3/8
Reversible MLT • Make the following operation reversible • Butterfly operation • Complex rotation • Conjugate rotation
Entropy Coder • Input: • quantized coefficients • Output: • embedded coded bitstream with R-D performance curve • Goal: • Compression • Embedded bitstream for future manipulation
Frame Grouping Time slot 1 2 3 4 5 6 7 8 Frame
Entropy Coder Bitstream D R R-D curve
Entropy Coder • Embedded coding • Implicit psychoacoustic masking • Context modeling • Arithmetic coding • Implementation concerns
45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 A block of coefficients Next View graph
0 1 0 1 1 Bits of Coefficients b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 45 0 1 0 1 1 0 0 1 1 + -74 1 0 0 1 0 1 0 - 21 0 0 1 0 1 0 1 + coefficient 14 0 0 0 1 1 1 0 + -4 0 0 0 0 1 0 0 - -18 0 0 1 0 0 1 0 - 4 0 0 0 0 1 0 0 + -1 0 0 0 0 0 0 1 -
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 + + 1 0 0 1 0 1 0 - 1 0 0 1 0 1 0 - 0 0 1 0 1 0 1 + 0 0 1 0 1 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 1 0 0 - 0 0 1 0 0 1 0 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 1 - Conventional Coding b1 b2 b3 b4 b5 b6 b7 Sign First w0 w1 w2 w3 w4 w5 w6 w7 46 Second -74 Third 22 0 0 0 0 0
0 0 1 1 0 0 1 1 1 1 0 0 1 1 + 0 1 0 + 1 0 0 1 0 1 0 - 1 0 0 - 0 0 1 0 1 0 1 + 0 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 1 0 0 1 0 - 0 0 1 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 0 0 0 1 - 0 0 0 Embedded Coding Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..47 40 -72 -79..-64 24 16..31 0 -31..31 0 -31..31 -24 -31..31 0 -31..31 0 -31..31 First Second Third
Audio Masking Signal Signal-to mask ratio Masking Threshold Maximum Mask Noise-to mask ratio Noise Level Frequency Critical Band Neighboring Band
Psychoacoustic Masking • Traditional approach (explicit masking, all existing approaches) • Calculate the mask • Transmit the mask • Modify transform coefficients (or coding approach) according to the masking • Encode the transform coefficients • Note • Mask modifies the coding content
Implicit Psychoacoustic Masking • Key • Mask modifies the coding order, the content is the same • Implicit masking • Calculate the static masking (Fletcher_Munson threshold) • Encode the MSB of the transform coefficients • Calculate the mask based on the MSB of the coefficients • Modify coding order • Encode the next most important part of the coefficients • Repeat the process
0 0 1 - 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 -63..63 0 0 -96 1 - -127..-64 0 -63..63 0 0 -63..63 0 0 0 -63..63 0 -63..63 0 Coefficient: Significant Insignificant 0 -127..127 0 0 -127..127 0 Mask First
0 0 1 1 + 1 0 - 0 0 0 0 0 0 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..63 48 0 1 + -96 1 0 - -127..-64 0 -31..31 0 0 0 -31..31 0 0 0 0 0 -31..31 0 -31..31 0 0 Coefficient: Significant Insignificant 0 -63..63 0 0 0 -63..63 0 0 First Second
Context Modeling • Context • Zero coding • Significant statuses of neighbor coefficients • Refinement • Whether it is the 1st refinement pass • Significant statuses of neighbor coefficients • Sign • Neighbor signs
45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 After Implicit Psychoacoustic Masking & Context Modeling To be encoded Bit: 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 …… Ctx: 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 …… Automatically generated
Arithmetic Coding – Illustration (QM Coder used) Coding result: 1 • What is arithmetic coding P0 0.100 P2 C P1 1-P2 A B (Shortest binary bitstream ensures that interval B=0.100 0000000 to C=0.100 1111111 is (B,C) A ) 1-P0 1-P1 0 S0=0 S1=1 S2=0
Entropy Coder (Summary) Bitstream D R R-D curve
Speed Up Issues • Context Modeling • Use stored context • Update context when a coefficient becomes significant • Implicit Masking • Fast calculation of energy in a critical band • Lookup table convert energy to mask • R-D curve calculation • Lookup table calculation of distortion • Context entropy coder • QM coder • Run-length Rice coder
Bitstream Assembly • Input : • Bitstream • R-D curve • Output : • Assembled bitstream • Companion file . . . Bitstream assembling