Advances in Speech and Audio Compression

Advances in Speech and Audio Compression Allen Gersho (1994) Presented by Bin Yu

Agenda • Introduction • Human vocal tract model • Speech coder classification • LPAS Speech Coding • Standardization

Human Speech Production System • Air flow forced from lungs to vocal tract • short-term correlations • Filter with resonances (called formants) • Speech sound classes • Voiced sounds • Voice cord vibration • Long-term periodicity • Unvoiced sounds • Constriction in the vocal tract • No long-term periodicity • Plosive sounds • Release of air pressure behind mouth

Speech Coder Classification • Waveform coders • Pulse Code Modulation (PCM) • Sample input waveform • Quantization • Differential PCM • Sample input waveform • Encode difference between adjacent samples • Adaptive DPCM • Adapt step size for quantization based on speech statistics • High quality, high bitrate

Speech Coder Classification (contd.) • Vocoders (source coders) • Linear prediction model for human voice system • Medium quality, low bitrate

LPAS – Linear-Prediction-based Analysis-by-Synthesis • Hybrid method • Vocoder’s linear prediction model • Careful selection of excitation signal to reconstruct original waveform • High quality, low bitrate!

LPAS – Linear-Prediction-based Analysis-by-Synthesis • How it works • Segment speech into frames (typically 20ms long) • Find filter parameter for each frame • Find excitation whose that minimizes prediction error • Perceptual weighting • More accuracy where speech energy is low • Transmit the filter parameter and excitation signal • Vector quantization

Vector Quantization • Example • Key challenge • Given a source distribution, how to select codebook (*) and partitions (---) to result in smallest average distortion • Solution: • Divide and conquer • Two codes  four  eight …

LPAS Classification • Three classes • Multi-Pulse Excited (MPE) • Regular-Pulse Excited (RPE) • Code-Excited Linear Predictive (CELP) • Difference lies in representation of excitation signal

Multi-Pulse Excited (MPE) Codec • Excitation is given by a fixed number of pulses • Position and amplitude of the pulses are computed to minimize error and transmitted to decoder • Finding the best match is theoretically possible but not practical • Suboptimal estimations are given • Typically about 4 pulses per 5 ms are used

Regular-Pulse Excited (RPE) • Multiple pulses used like in MPE • Regularly spaced at fixed period • Only needs to transmit first pulse’s position and all pulses amplitude • More pulses are allowed for better quality at same bitrate • Around 10 pulses per 5 ms

Code-Excited Linear Predictive (CELP) • Excitation is given by • an entry from a large vector quantizer codebook • A gain term for its power (amplitude) • Key challenge • Searching for the right excitation entries in realtime • Solution: restructure the codebook optimized for searching (such as a tree) • Performance • 4.8kbps or lower bitrate with good quality

Further Improvements on CELP • Representation of pitch period • Adaptive Long-term prediction + short-term adjustment • Coding of LP filter • Vector quantization of filter representation • Multimode coding • Dynamic bit allocation between excitation, LP filter and pitch

Standardization • ITU G.711 • PCM (A-law and U-low) • ITU G.722 • ADPCM • ITU G.728 • Low-Delay CELP • GSM • Algebraic CELP • MPEG4 • MPEG4 CELP

Refereces • Human voice model • http://cnx.rice.edu/content/m0049/latest/ • Speech Compression • http://www.data-compression.com/speech.shtml • Speech coding tutorial • http://www-mobile.ecs.soton.ac.uk/speech_codecs/ • Standard codecs • http://www.ittiam.com/pages/products/g711.htm • Spanias, AS, “Speech coding, a tutorial review”, 1994

Advances in Speech and Audio Compression

Advances in Speech and Audio Compression

Presentation Transcript

Dualities in Digital Audio Compression and Digital Audio Watermarking

Audio Compression

MPEG Audio Compression

Audio Compression

Audio Compression

Audio/Video compression Security

Modern Trends in Audio Compression

Audio Compression Techniques

Audio Compression

Digital Audio Compression

Non-Speech Audio

Audio Compression

S kills : audio compression

Speech Compression

“Speech Compression”

SPEECH COMPRESSION

Speech and Audio Coding

Digital Audio Compression

On-Speech Audio

AUDIO COMPRESSION

2.4 Audio Compression