500 likes | 1.24k Views
MPEG Audio Compression. by V. Loumos. Introduction. Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio standard Part of a multiple standard for Video compression Audio compression
E N D
MPEG Audio Compression by V. Loumos
Introduction • Motion Picture Experts Group (MPEG) • International Standards Organization (ISO) • First High Fidelity Audio standard • Part of a multiple standard for • Video compression • Audio compression • Audio, Video and Data synchronization at an aggregate rate of1.5 Mbit/sec
MPEG Audio • Physically Lossy compression algorithm • Perceptually lossless, transparent algorithm • Exploits perceptual properties of human ear • Psychoacoustic modeling
Medium quality audio compression • Code Excited Linear Prediction • for speech coding • μ-law • Adaptive Differential Pulse Code Modulation
The MPEG Audio standard • Ensures inter-operability • Defines coded bitstream syntax • Defines decoding process • Guarantees decoder’s accuracy
MPEG audio acceptance • Wide acceptance • Large number of MPEG audio codecs produced • Stand-alone, Mobile phone add-ons etc
MPEG audio features • No assumptions about the nature of the audio source • Exploitation of human auditory system perceptual limitations • Removal of perceptually irrelevant parts of audio signal
MPEG audio sampling rates • 32 kHz • 44.1 kHz • 48 kHz
MPEG audio supports • One or two audio channels in • a monophonic mode for a single audio channel • a dual monophonic mode for two independent audio channels • a stereo mode with sharing of bits • a joint stereo mode based on the correlation or the phase difference between channels
MPEG audio supports • Several predefined fixed bit rates ranging from 32 to 224 kbits/sec per channel • Free bit rate other than the predefined rates
MPEG audio offers • Three independent layers of compression • A wide range of tradeoffs between codec complexity and compressed audio quality
MPEG Audio Layer I • Simplest coding • Suitable for bit rates above 128 kbits/sec per channel • Philips Digital Compact Cassette
MPEG Audio Layer II • Intermediate complexity • Bit rates around 128 kbits/sec per channel • Digital Audio Broadcasting (DAB) • Synchronized Video and Audio on CD-ROM • Full motion CD-I • Video-CD
MPEG Audio Layer III • Most complex coding • Best audio quality • Bit rates around 64 kbits/sec per channel • Suitable for audio over ISDN
MPEG Audio extras • All three layers allow single chip real-time decoder implementation • Optional Cyclic Redundancy Check (CRC) error detection • Ancillary data may be included in the bit stream
Overview • Quantization, the key to MPEG audio compression • Transparent, perceptually lossless compression • No distinction between original and 6-to-1 compressed audio clips • stereo, 16 bit/sample, sampled at 48 kHz, compressed at 256 kbits/sec
The Polyphase Filter Bank • Key component common to all layers • Divides the audio signal into 32 equal-width frequency subbands • The filters provide good time and reasonable frequency resolution • Critical bands associated with psychoacoustic models
Psychoacoustics • The aim is to remove acoustically irrelevant parts of the audio signal • The human auditory system is unable to hear quantization noise under conditions of auditory masking • Masking occurs whenever a strong signal makes a neighborhood of weaker audio signals imperceptible
Critical bands • The human auditory system has a limited, frequency dependent resolution • This frequency dependence is expressed in the form of critical band widths, less then 100 Hz for low and more then 4kHz for high frequencies • The human ear blurs the various signal components inside a critical band
Noise masking threshold • Human ear resolving power is frequency dependent • Noise masking threshold, at any frequency, depends only on the signal energy within a limited bandwidth neighborhood that frequency
The Psychoacoustic Model • Analyzes the audio signal and computes the amount of noise masking as a function of frequency • The encoder decides how best to represent the input signal with a minimum number of bits
Basic Steps • Time align audio data • Convert audio to frequency domain representation • Process spectral values into tonal and non-tonal components • Apply a spreading function • Set a lower bound for threshold values • Find the threshold values for each subband • Calculate the signal to mask ratio
MPEG Layer III coding • Based on Layer I&II filter banks • Compensation of filter deficiencies by processing outputs with a Modified Discrete Cosine Transform
Layer III enhancements • Alias reduction • Non uniform quantization • Scalefactor bands • Entropy coding of data values • Use of a “bit reservoir”