Audio Coding

Audio Coding

Digitization Processing Signal encoder Digital data Signal decoder Analog signal storage sampling quantization

Overview of Today Sampling Techniques • PCM • Linear • m-LaW • DPCM • ADPCM • MPEG-1 • Vocoding Generic Coding Techniques Psychoacoutic Coding Speech Specific Techniques

Encode Design • Bandlimiting filter • Smooth analog signals • Analog to digital converter (ADC) • Sample and Quantize analog signals.

Bandlimiting filter Pass only frequency components up to half of Nyquist rate.

Analog to digital converter

Sampling • Pulse Amplitude Modulation (PAM) • Each sample’s amplitude is represented by 1 ________ value • Sampling theory (_________) • If input signal has ________ frequency (bandwidth) f, sampling frequency must be at least ____ • With a _____-pass filter to interpolate between samples, the input signal can be fully reconstructed

Quantization error (“noise”) 0100 0011 0010 0001 0000 1001 1010 1011 1100 SNR – 4.77 n = 6.02 PCM • Pulse Code Modulation (PCM) • Each sample’s amplitude represented by an ________ code-word • Each bit of resolution adds __ dB of dynamic range • Number of bits required depends on the amount of noise that is tolerated

Linear PCM • Quantization levels are _________ spaced. • ___ bit samples provide plenty of dynamic range. • Compact Disks do this.

Under Sampling • Sample rate under Nyquist rate LF also called antialiasing filter Added to original signal and cause distortion.

Quantization intervals

Associated waveform set

-Law companding (ITU Rec. G.711) • Non-linear quantization of the signal’s amplitude • Quantization step-size decreases logarithmically with signal ______ • Low-amplitude samples represented with ______ accuracy than high-amplitude samples • Humans are less sensitive to changes in “____” sounds than “_____” sounds

ln(1 + |x|) f(x) = 127 x sign(x) x ln(1 + ) -Law companding • Provides __-bit quality (dynamic range) with an _-bit encoding • Used in North American & Japanese ISDN voice service • Simple to compute encoding (x normalized to [-1, 1])

Difference Encoding 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differential-PCM (DPCM) • Exploit _________ redundancy in samples • ___________ between 2 x-bit samples can be represented with significantly fewer than x-bits • Transmit the difference (rather than the ________)

DPCM Working Principle Previous sampling value

“Slope Overload” Slope Overload Problem 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differences in high frequency signals near the ___________ frequency cannot be represented with a smaller number of bits! • Error introduced leads to severe distortion in the ______ frequencies

Adaptive DPCM (ADPCM) • Use a larger step-size to encode differences between ______-frequency samples & a smaller step-size for differences between ____-frequency samples • Use ________ sample values to estimate changes in the signal in the near future

ADPCM • To ensure differences are always small... • Adaptively change the ____-size (quanta) • (Adaptively) attempt to _____ next sample value y-bit PCM sample x-bit ADPCM “difference” + Difference Quantizer + – Step-Size Adjuster Predicted PCM Sample n+1 + Predictor Dequantizer + +

Psychoacoustic Fundamentals • Absolute threshold of hearing • Critical band frequency analysis • Frequency masking • Temporal masking

100 80 60 40 20 0 Audible Inaudible 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 Absolute Threshold of Hearing Maximum allowable Energy level for Coding distortion • Human perception of sound is a function of ________ and signal __________ • (MPEG exploits this relationship.) • Sampled segments of the source audio waveform are analyzed but only those features _____________ to the ear are transmitted. • Psychoacoustic model is used to identify _________ masking and ________ masking and eliminate them from the transmitted signal. Sound Level (dB) Frequency (kHz)

Auditory Masking 100 80 60 40 20 0 Audible • The presence of tones at certain frequencies makes us unable to perceive tones at other “_________” frequencies • Humans cannot distinguish between tones within _____ Hz at low frequencies and _____kHz at high frequencies Sound Level (dB) Masking tone Masked tone Inaudible Frequency (kHz) 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20

MPEG Encoder Block Diagram PCM Audio Samples (32, 44.1, 48 kHz) Mapping Quantizer Coding Psycho- acoutstic Model Frame Packing Encoded Bitstream Ancillary Data

Vo-coding • Concept: Develop a __________ model of the vocal cords & throat • Derive/compute _____ parameters for a short interval and transmit to the decoder • Use the parameters to _______ speech at the decoder • So what is a good model? • A “buzzer” in a “tube”! • The buzzer is characterized by its _________ & _______ • The tube is characterized by its ___________s

Vocoding - Basic Concepts 75 60 45 30 15 0 • Formant — frequency maxima & minima in the spectrum of the speech signal • Vocoders code • _____ • Period • _________, and • signaling vocal tract _________ parameters • Voiced sounds, m,v,and l. • Unvoiced sounds, f and s. Amplitude Frequency (kHz)

p  k=1 “Buzzer” and “Tube” Model • Vocoding principles: • voice = _________s + buzz ______ & intensity • voice – estimated ________s = “residue” “yadda yadda yadda” • Linear Predictive Coding (LPC) • A sample is represented as a linear combination of ___ previous ________s y(n) =aky(n – k) +Gxx(n)

LPC • Decoder artificially generates speech via _________ synthesis • A mathematical simulation of the _______ as a series of bandpass filters • Encoder codes & transmit filter _______, pitch period, gain factor, & nature of excitation

LPC Schematic

LPC Related Standards • Standards: • Regular Pulse Excited Linear Predictive Coder (RPE-LPC) • Digital cellular standard GSM 6.1 (___ kbps) • Code Excited Linear Predictive Coder (CELP) • US Federal Standard 1016 (_____ kbps) • Waveform template based to improve sound quality. • Linear Predictive Coder (LPC) • US Federal Standard 1015 (______ kbps) • Very synthetic and used primarily in military applications with very limited bandwidth.

Networking Concerns • Audio bandwidth is actually quite small. • But human sensitivity to loss and noise is quite ________. • Networking concerns: • _______ concealment • ________ control • Especially for telephony applications.

Audio Coding

Audio Coding

Presentation Transcript

Coding

Audio-Visual Coding in SG 16 and Future Directions

PAC/AAC audio coding standard

Audio Video coding Standard of (AVS) China

Speech & Audio Coding

Perceptual Audio Coding The AT&T/Bell Labs view

Audio Coding and Standards

Audio Coding

Speech and Audio Processing and Coding (cont.)

Audio Coding

Speech and Audio Processing and Coding (cont.)

Perceptual audio coding

Speech and Audio Processing and Coding

Introduction of MPEG-2 AAC Audio Coding

Psychoacoustic audio coding on ARM CPUs

An Overview of Perceptual Audio Coding and MPEG AAC

Audio Coding

Speech and Audio Coding

Speech and Audio Processing and Coding (cont.)

Audio Video coding Standard of (AVS) China

BC Advantage Audio Series: Medicare Risk Adjustment Coding 101

AAC Advanced Audio Coding

Audio Coding

Audio Coding

Presentation Transcript

Coding

Audio-Visual Coding in SG 16 and Future Directions

PAC/AAC audio coding standard

Audio Video coding Standard of (AVS) China

Speech &amp; Audio Coding

Perceptual Audio Coding The AT&amp;T/Bell Labs view

Audio Coding and Standards

Audio Coding

Speech and Audio Processing and Coding (cont.)

Audio Coding

Speech and Audio Processing and Coding (cont.)

Perceptual audio coding

Speech and Audio Processing and Coding

Introduction of MPEG-2 AAC Audio Coding

Psychoacoustic audio coding on ARM CPUs

An Overview of Perceptual Audio Coding and MPEG AAC

Audio Coding

Speech and Audio Coding

Speech and Audio Processing and Coding (cont.)

Audio Video coding Standard of (AVS) China

BC Advantage Audio Series: Medicare Risk Adjustment Coding 101

AAC Advanced Audio Coding

Speech & Audio Coding

Perceptual Audio Coding The AT&T/Bell Labs view