1 / 25

Audio Codecs

Audio Codecs. Dan Mechanic CS W4995. Why are there different codecs?. Each trying to find the best balance, between: Fast Processing Good Compression Quality (accurate) decoding. The best balance can depend on application:. Music: wav encoder compromises compression lossless

Samuel
Download Presentation

Audio Codecs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio Codecs Dan Mechanic CS W4995

  2. Why are there different codecs? Each trying to find the best balance, between: • Fast Processing • Good Compression • Quality (accurate) decoding

  3. The best balance can depend on application: Music: wav encoder compromises compression • lossless • ~1.4Mbps • Sacrifice: Compression aac encoder compromises fast processing • technically lossy, but still quality decoding • via sophisticated compression algorithms 320kbps • Sacrifice: Processing Compact Disc: 16-bit 44.1kHz

  4. The best balance can depend on application: Music: wav encoder compromises compression • lossless • ~1.4Mbps • Sacrifice: Compression aac encoder compromises fast processing • technically lossy, but still quality decoding • via sophisticated compression algorithms 320kbps • Sacrifice: Processing

  5. Why are there different codecs? Standards • Recommendations from the ITU (International Telecommunications Union) Existing Technologies • G.711 was created in the early seventies for pstn lines supporting 8-bit 8000 samples per second • Now G.711 can be a good choice for VOIP because it sounds like a traditional land line and has low latency (less processing at the media gateways) Patents End User Expectations

  6. Other constraints… Nyquist Theorem - “When converting from an analog signal to digital (or otherwise sampling a signal at discrete intervals), the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version.” source: http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html

  7. What methods do codecs meant for speech use? • Many, many codecs… • only a handful of methodologies.

  8. Pulse Code Modulation image source: http://en.wikipedia.org/wiki/Pulse-code_modulation

  9. Pulse Code Modulation can require a high bitrate G.711 uses different “companding” algorithms to reduce bitrate. • Compression - to reduce audio peaks • Expansion - raise the floor of the audio. • Actually performed via a logarithmic transformation of a 13-14bit number to a 8-bit number

  10. μ-law and A-law algorithms μ-law • Used in North America and Japan • specifically for turning 14-bit encoding to 8 A-law • Used in Europe • converts 13 bit to 8 bit

  11. Differential Pulse Code Modulation • Waveforms act fairly predictably • We can look at a previous sample and predict the value of the next one. • If coder and decoder agree on what algorithm to predict with, only the difference between prediction and actual needs to be transmitted.

  12. Differential Pulse Code Modulation image from “Speech Compression” by Mark Handley: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf

  13. Adaptive Differential Pulse Code Modulation • Algorithms for next-sample prediction can be dynamic to more accurately represent the waveform we are encoding/decoding. • Vary predictor to adapt to the changing characteristics of the audio being recorded. • G.721 uses the previous 8 samples, and can quantized the difference to 4-bits (32Kbs)

  14. Sub-Band Differential Pulse Code Modulation “not all frequencies created equal” • Lower frequencies (50Hz-3.5kHz) are important to understanding speech, and are more sensitive to quantization errors. • Higher frequencies (3.5kHz-7kHz) are used for conveying emotion and recognition of the speaker

  15. Sub-Band Differential Pulse Code Modulation “not all frequencies created equal” …so don’t treat them the same • Lower frequencies (50Hz-3.5kHz) sample at 16kHz • Higher frequencies (3.5Khz-7kHz), less important, down-sample to 8kHz • mux these together to get (64kbs)… same compression, better decoding quality, at the price of processing • G.721, G.726

  16. Linear PredictiveSource-Filter Speech Model • An algorithm that models speech image source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

  17. Linear Predictive Based on a simple model of human speech • Buzzer - your glottis or vocal chords, provides pitch • Tube - builds resonance and gives rise to ‘formants’ • Hiss and pops - tongue, lips and throat make sibilants and plosives (“s”,”k”,”p”)

  18. Linear Predictive Formants • peaks in the frequency spectrum caused by acoustic resonance. image of the frequency response of the typical vowel sound source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

  19. Linear Predictive Encoding • operates on a sample of sound (around 20ms) • remove formants, and leave ‘residue’ sound (buzz), determine tone of ‘residue’ • Determine whether sound is voiced or unvoiced • voiced - tonal “m” “v” • unvoiced - sibilance and plosives “s” “k” • optimized using a series of linear predictive coefficients

  20. Linear Decoding img source: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf “Speech Compression” Mark Handley

  21. Linear Predictive Encoding What’s the limitation? Our speech creation is not in fact so simple. For some sounds, nasal passages create a ‘side-branch’ to our tube..

  22. Code Excited Linear Predictive(CELP) • Instead of sending a series of coefficients, agree on a ‘codebook’ of coefficients, and send a reference to the code you are using. • Don’t need a codebook for every pitch. One pitch can be delayed for lower frequencies. • Speex (open-source patent free)

  23. Linear Predictive - Other Variants • Regular-Pulse Excitation Long-Term Predictor (GSM) • Low Delay Code Excited Linear Prediction (G.728) • Conjugated Structure Algebraic Code Excited Linear Prediction (G.729)

  24. Many Codecs follow these two basic models

  25. References • http://www.cs.columbia.edu/~hgs/audio/codecs.html • http://www.fact-index.com/p/pu/pulse_code_modulation_1.html • http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html • http://en.wikipedia.org/wiki/Pulse-code_modulation • http://www1.cs.columbia.edu/~sedwards/classes/2004/4840/reports/manic.pdf • http://www-mobile.ecs.soton.ac.uk/speech_codecs/standards/adpcm.html • http://www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf “Speech Compression” Mark Handley • http://www.myspace.com/growing_up_is_hard_2_do - speak n spell image • A good introduction to LPC Dr. Sung-won Park[2] Texas A&M University-Kingsville • http://en.wikipedia.org/wiki/G.711 • ITU-T recomendation G.711 • http://en.wikipedia.org/wiki/%CE%9C-law_algorithm • Soundfiles: www.Data-Compression.com • http://www.otolith.com/otolith/olt/lpc.html

More Related