340 likes | 605 Views
A Tutorial on MPEG/Audio Compression. By Davis Pan Presented by Adam Horne and Rich Finley. Section 1: Outline. What is MPEG/audio compression? Features and Applications. What is MPEG/audio compression?.
E N D
A Tutorial on MPEG/Audio Compression By Davis Pan Presented by Adam Horne and Rich Finley
Section 1: Outline • What is MPEG/audio compression? • Features and Applications
What is MPEG/audio compression? • MPEG/audio is the international standard for digital compression of high-fidelity audio. • MPEG/audio is the result of over 3 years of research by the Motion Picture Experts Group. • MPEG/audio standard adopted by International Organization for Standards and the International Electrotechnical Commission (ISO/IEC) in 1992.
What is MPEG/audio compression? • The standard defines strict rules to ensure inter-operability. • Defines the syntax for encoding and decoding. • Tests accuracy of decoding. • Ensures any fully compliant decoder can decode any MPEG/audio file with a predictable result.
Features and Applications • Compresses without regard to the source of the audio data. • Removes distortion and other features that are imperceptible to the human ear. • Compression is lossy, but losses are perceptually irrelevant.
Features and Applications • Audio sampling rates of 32, 44.1, or 48 kHz. • Two audio channels with 4 modes: • Single audio channel in monophonic mode • Two independent audio channels in dual-monophonic mode • Stereo channels with sharing of bits between them • Joint stereo mode using the correlations between the two channels and the irrelevancy of phase differences between the channels
Features and Applications • Several predefined bit rates from 32 to 224 kbits/s per channel. Compression factors ranges from 2.7 to 24. • Three different layers of compression: • Layer I is simple and used for bit rates greater than 128 kbits/s per channel. • Layer II is more complex and used for bit rates of 128 kbits/s per channel. • Layer III is the most complex with the best audio quality and used for bit rates of 64 kbits/s per channel.
Features and Applications • An optional Cyclic Redundancy Check (CRC) error detection code is supported. • Ancillary data may be included in the bitstream. • Allows random access, fast forward, and rewind.
Section 2: Outline • The Polyphase Filter Bank • Psychoacoustics • Layer Coding Options • Bit Allocation • Stereo Redundancy Coding
Section 2 Overview MPEG/audio compression is quantization. Although quantization is lossy, this algorithm can give "transparent", perceptually lossless, compression. Studies show that audio can be compressed up a 6/1 ratio before people can hear loss. During compression, the signal-to-mask ratios are used to decide how to apportion the total number of code bits available for the quantization of the subband signals in order to minimize the audibility of the quantization noise.
Compression Map Signal-to-Mask Ratios Filter Bank Audio Input Stream Psychoacoustic Model Bit or Noise Allocation Block % % % % Minimal Audibility of the Quantization Noise. Re-Format Coded Bit Stream
The Polyphase Filter Bank The filter bank divides the audio signal into 32 equal-width frequency subbands. The design of the filter bank is a good compromise with three notable concessions. • The equal widths of the subbands do not accurately reflect the human auditory system's frequency dependent behavior. • The filter bank and its inverse are not lossless transformations. Even without quantization, the inverse transformation cannot perfectly recover the original signal. However, by design the error introduced by the filter bank is small and inaudible. • adjacent filter bands have a major frequency overlap. A signal at a single frequency can affect two adjacent filter bank outputs.
Equation for the filter bank outputs 63 7 st[i] = ∑ ∑ M[i][k] * (C[k+64j] * x[k+64j]) k=0 j=0 i is the subband index and ranges from 0 to 31, st[i] is the filter output sample for subband i at time t, where t is an integer multiple of 32 audio sample intervals, C[n] is one of 512 coefficients of the analysis window defined in the standard, x[n] is an audio input sample read from a 512 sample buffer, and M[i][k] = cos[((2*i+1)*(k-16)*p) / 64] are the analysis matrix coefficients.
Psychoacoustics Definition • Psychoacoustics is the study of human perception of sound. • Psychoacoustics deals with relations between perception of sound and physical properties of sound waves.* *http://sound.eti.pg.gda.pl/SRS/psychoacoust.html
Psychoacoustics The MPEG/audio algorithm compresses the audio data in large part by removing the acoustically irrelevant parts of the audio signal. That is, it takes advantage of the human auditory system's inability to hear quantization noise under conditions of auditory masking.
The Psychoacoustic Model • The psychoacoustic model analyzes the audio signal and computes the amount of noise masking that is available as a function of frequency.
Example of Psychoacoustic Model Steps • Time align audio data. • Convert audio to a frequency domain representation. • Process spectral values in groupings related to critical band widths. • Separate spectral values into tonal and non-tonal components. • Apply a spreading function. • Set a lower bound for the threshold values. • Find the masking threshold for each subband. • Calculate the signal-to-mask ratio.
Layer Coding Options • The MPEG/audio standard has 3 distinct layers for compression. • Layer I forms the most basic algorithm while Layer II and layer III are enhancements that use some elements found in Layer I.
Layer 1 Coding Options The Layer I algorithm codes audio in frames of 384 audio samples. It does so by grouping together 12 samples from each of the 32 subbands, as shown in figure 18. Besides the code for audio data, each frame contains a header, an optional Cyclic Redundancy Code (CRC) error check word, and possibly ancillary data.
Layer 2 Coding Options • The Layer II algorithm codes the audio data in larger groups and imposes some restrictions on the possible bit allocations for values from the middle and higher subbands. • It also represents the bit allocation, the scale factor values, and the quantized samples with a more compact code. Layer II gets better audio quality by saving bits in these areas so more code bits are available to represent the quantized subband values. • The Layer II encoder forms frames of 1152 samples per audio channel. Whereas Layer I codes data in single groups of 12 samples for each subband, Layer II codes data in 3 groups of 12 samples for each subband.
Layer 3 Coding Options • The Layer III algorithm is a much more refined approach derived from ASPEC and OCF algorithms • Layer III compensates for some of filter bank deficiencies by processing the filter outputs with a Modified Discrete Cosine Transform (MDCT) • Layer III encoder can partially cancel some aliasing caused by the polyphase filter bank, once the subband components are subdivided in frequency.
Layer3 Coding Advantages • Alias reduction. Layer III specifies a method of processing the MDCT values to remove some artifacts caused by the overlapping bands of the polyphase filter bank. • Non uniform quantization. The Layer III quantizer raises its input to the 3/4 power before quantization to provide a more consistent signal-to-noise ratio over the range of quantizer values. The requantizer in MPEG/audio decoder relinearizes the values by raising its output to the 4/3 power. • Continued
Layer3 Coding Advantages (Continued) • Scalefactor bands. Unlike Layer I and II, where there can be a different scalefactor for each subband, Layer III uses scalefactor bands. • Entropy coding of data values. Layer III uses variable-length Huffman codes to encode the quantized samples to get better data compression. After quantization, the encoder orders the 576 (32 subbands * 18 MDCT coeficients/subband) quantized MDCT coefficients in a predetermined order. • Use of a "bit reservoir". The design of the Layer III bitstream better fits the encoder's time-varying demand on code bits. As with Layer II, Layer III processes the audio data in frames of 1152 samples.
Bit Allocation • The bit allocation process determines the number of code bits to be allocated to each subband based on information from the psychoacoustic model. For Layer I and II, this process starts by computing the mask-to-noise ratio as given by the following equation: MNRdB = SNRdB – SMRdB Where: • MNRdB is the mask-to-noise ratio, • SNRdB is the signal-to-noise ratio, and • SMRdB is the signal-to-mask ratio from the psychoacoustic model. • All values are in decibels.
Bit Allocation (Continued) The Layer III encoder uses noise allocation. The encoder iteratively varies the quantizers in an orderly way, quantizes the spectral values, counts the number of Huffman code bits required to code the audio data and calculates the resulting noise. If, after quantization, there are still scalefactor bands with more than the allowed distortion, the encoder amplifies the values in those scalefactor bands and effectively decreases the quantizer step size for those bands.
Bit Allocation (Continued) After this, the process repeats. The process stops if a time-limit is reached or any of these three conditions is true: 1. None of the scalefactor bands have more than the allowed distortion. 2. The next iteration would cause the amplification for any of the bands to exceed the maximum allowed value. 3. The next iteration would require all the scalefactor bands to be amplified.
Stereo Redundancy Coding • Two types of stereo redundancy supported: • Intensity stereo coding: Supported by all layers. • Middle/Side (MS) stereo coding: Supported only by layer III. • Takes advantage of another property of the human auditory system: above 2 kHz and within each critical band, perception of stereo imaging is based more on the temporal envelope than the temporal fine structure.
Intensity Stereo Coding • Upper frequency subband outputs are represented as a single signal instead of independent left and right channels. • The decoder reconstructs the channels based on the signal and scale factors for the left and right channels. • The spectral shape of the channels is the same within each subband, but the magnitude is different.
Middle/Side (MS) Stereo Coding • Certain frequency ranges are encoded as: • Middle channel: Sum of left and right channels • Side channel: Difference of left and right channels • The side channel is compressed further using specially tuned threshold values.
Section 3: Future MPEG/audio Standards: Phase 2 • MPEG-2 audio became an international standard in November 1994. • This further extends the original MPEG/audio standard.
Extensions in MPEG-2 • Multichannel audio support: 5 high frequency channels and 1 low frequency enhanced channel (5.1 channels) usable for High Definition Television and digital movies. • Multilingual audio support: 7 additional channels for commentary.
Extensions in MPEG-2 (cont.) • Lower compressed audio bit rates: supports bit rates as low as 8 kbits/s. • Lower audio sampling rates: accommodates 16, 22.05, and 24 kHz rates as well as 32, 44.1, and 48 kHz rates used in MPEG-1. Commentary channels have a rate half that of the rate of the high fidelity channels.
MPEG-2 Compatibility • MPEG-2 decoders can decode MPEG-1 audio streams. • MPEG-1 decoders can decode two main streams of MPEG-2 audio streams. Weighted versions of the 5.1 channels are combined into left and right channels used by the MPEG-1 decoder. Additional decoding information is stored in auxiliary data streams.
Future Work • The MPEG group is working on a new standard that is not backwards compatible that allows better compression for multichannel audio encoding.