1 / 15

Advances in Speech and Audio Compression

Advances in Speech and Audio Compression. Allen Gersho (1994) Presented by Bin Yu. Agenda. Introduction Human vocal tract model Speech coder classification LPAS Speech Coding Standardization. Human Speech Production System. Air flow forced from lungs to vocal tract

orinda
Download Presentation

Advances in Speech and Audio Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advances in Speech and Audio Compression Allen Gersho (1994) Presented by Bin Yu

  2. Agenda • Introduction • Human vocal tract model • Speech coder classification • LPAS Speech Coding • Standardization

  3. Human Speech Production System • Air flow forced from lungs to vocal tract • short-term correlations • Filter with resonances (called formants) • Speech sound classes • Voiced sounds • Voice cord vibration • Long-term periodicity • Unvoiced sounds • Constriction in the vocal tract • No long-term periodicity • Plosive sounds • Release of air pressure behind mouth

  4. Speech Coder Classification • Waveform coders • Pulse Code Modulation (PCM) • Sample input waveform • Quantization • Differential PCM • Sample input waveform • Encode difference between adjacent samples • Adaptive DPCM • Adapt step size for quantization based on speech statistics • High quality, high bitrate

  5. Speech Coder Classification (contd.) • Vocoders (source coders) • Linear prediction model for human voice system • Medium quality, low bitrate

  6. LPAS – Linear-Prediction-based Analysis-by-Synthesis • Hybrid method • Vocoder’s linear prediction model • Careful selection of excitation signal to reconstruct original waveform • High quality, low bitrate!

  7. LPAS – Linear-Prediction-based Analysis-by-Synthesis • How it works • Segment speech into frames (typically 20ms long) • Find filter parameter for each frame • Find excitation whose that minimizes prediction error • Perceptual weighting • More accuracy where speech energy is low • Transmit the filter parameter and excitation signal • Vector quantization

  8. Vector Quantization • Example • Key challenge • Given a source distribution, how to select codebook (*) and partitions (---) to result in smallest average distortion • Solution: • Divide and conquer • Two codes  four  eight …

  9. LPAS Classification • Three classes • Multi-Pulse Excited (MPE) • Regular-Pulse Excited (RPE) • Code-Excited Linear Predictive (CELP) • Difference lies in representation of excitation signal

  10. Multi-Pulse Excited (MPE) Codec • Excitation is given by a fixed number of pulses • Position and amplitude of the pulses are computed to minimize error and transmitted to decoder • Finding the best match is theoretically possible but not practical • Suboptimal estimations are given • Typically about 4 pulses per 5 ms are used

  11. Regular-Pulse Excited (RPE) • Multiple pulses used like in MPE • Regularly spaced at fixed period • Only needs to transmit first pulse’s position and all pulses amplitude • More pulses are allowed for better quality at same bitrate • Around 10 pulses per 5 ms

  12. Code-Excited Linear Predictive (CELP) • Excitation is given by • an entry from a large vector quantizer codebook • A gain term for its power (amplitude) • Key challenge • Searching for the right excitation entries in realtime • Solution: restructure the codebook optimized for searching (such as a tree) • Performance • 4.8kbps or lower bitrate with good quality

  13. Further Improvements on CELP • Representation of pitch period • Adaptive Long-term prediction + short-term adjustment • Coding of LP filter • Vector quantization of filter representation • Multimode coding • Dynamic bit allocation between excitation, LP filter and pitch

  14. Standardization • ITU G.711 • PCM (A-law and U-low) • ITU G.722 • ADPCM • ITU G.728 • Low-Delay CELP • GSM • Algebraic CELP • MPEG4 • MPEG4 CELP

  15. Refereces • Human voice model • http://cnx.rice.edu/content/m0049/latest/ • Speech Compression • http://www.data-compression.com/speech.shtml • Speech coding tutorial • http://www-mobile.ecs.soton.ac.uk/speech_codecs/ • Standard codecs • http://www.ittiam.com/pages/products/g711.htm • Spanias, AS, “Speech coding, a tutorial review”, 1994

More Related