410 likes | 856 Views
Audio Coding. Digitization Processing. Signal encoder. Digital data. Signal decoder. Analog signal. storage. sampling. quantization. Overview of Today. Sampling Techniques. PCM Linear m -LaW DPCM ADPCM MPEG-1 Vocoding. Generic Coding Techniques. Psychoacoutic Coding.
E N D
Digitization Processing Signal encoder Digital data Signal decoder Analog signal storage sampling quantization
Overview of Today Sampling Techniques • PCM • Linear • m-LaW • DPCM • ADPCM • MPEG-1 • Vocoding Generic Coding Techniques Psychoacoutic Coding Speech Specific Techniques
Encode Design • Bandlimiting filter • Smooth analog signals • Analog to digital converter (ADC) • Sample and Quantize analog signals.
Bandlimiting filter Pass only frequency components up to half of Nyquist rate.
Sampling • Pulse Amplitude Modulation (PAM) • Each sample’s amplitude is represented by 1 ________ value • Sampling theory (_________) • If input signal has ________ frequency (bandwidth) f, sampling frequency must be at least ____ • With a _____-pass filter to interpolate between samples, the input signal can be fully reconstructed
Quantization error (“noise”) 0100 0011 0010 0001 0000 1001 1010 1011 1100 SNR – 4.77 n = 6.02 PCM • Pulse Code Modulation (PCM) • Each sample’s amplitude represented by an ________ code-word • Each bit of resolution adds __ dB of dynamic range • Number of bits required depends on the amount of noise that is tolerated
Linear PCM • Quantization levels are _________ spaced. • ___ bit samples provide plenty of dynamic range. • Compact Disks do this.
Under Sampling • Sample rate under Nyquist rate LF also called antialiasing filter Added to original signal and cause distortion.
-Law companding (ITU Rec. G.711) • Non-linear quantization of the signal’s amplitude • Quantization step-size decreases logarithmically with signal ______ • Low-amplitude samples represented with ______ accuracy than high-amplitude samples • Humans are less sensitive to changes in “____” sounds than “_____” sounds
ln(1 + |x|) f(x) = 127 x sign(x) x ln(1 + ) -Law companding • Provides __-bit quality (dynamic range) with an _-bit encoding • Used in North American & Japanese ISDN voice service • Simple to compute encoding (x normalized to [-1, 1])
Difference Encoding 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differential-PCM (DPCM) • Exploit _________ redundancy in samples • ___________ between 2 x-bit samples can be represented with significantly fewer than x-bits • Transmit the difference (rather than the ________)
DPCM Working Principle Previous sampling value
“Slope Overload” Slope Overload Problem 0100 0011 0010 0001 0000 1001 1010 1011 1100 • Differences in high frequency signals near the ___________ frequency cannot be represented with a smaller number of bits! • Error introduced leads to severe distortion in the ______ frequencies
Adaptive DPCM (ADPCM) • Use a larger step-size to encode differences between ______-frequency samples & a smaller step-size for differences between ____-frequency samples • Use ________ sample values to estimate changes in the signal in the near future
ADPCM • To ensure differences are always small... • Adaptively change the ____-size (quanta) • (Adaptively) attempt to _____ next sample value y-bit PCM sample x-bit ADPCM “difference” + Difference Quantizer + – Step-Size Adjuster Predicted PCM Sample n+1 + Predictor Dequantizer + +
Psychoacoustic Fundamentals • Absolute threshold of hearing • Critical band frequency analysis • Frequency masking • Temporal masking
100 80 60 40 20 0 Audible Inaudible 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 Absolute Threshold of Hearing Maximum allowable Energy level for Coding distortion • Human perception of sound is a function of ________ and signal __________ • (MPEG exploits this relationship.) • Sampled segments of the source audio waveform are analyzed but only those features _____________ to the ear are transmitted. • Psychoacoustic model is used to identify _________ masking and ________ masking and eliminate them from the transmitted signal. Sound Level (dB) Frequency (kHz)
Auditory Masking 100 80 60 40 20 0 Audible • The presence of tones at certain frequencies makes us unable to perceive tones at other “_________” frequencies • Humans cannot distinguish between tones within _____ Hz at low frequencies and _____kHz at high frequencies Sound Level (dB) Masking tone Masked tone Inaudible Frequency (kHz) 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20
MPEG Encoder Block Diagram PCM Audio Samples (32, 44.1, 48 kHz) Mapping Quantizer Coding Psycho- acoutstic Model Frame Packing Encoded Bitstream Ancillary Data
Vo-coding • Concept: Develop a __________ model of the vocal cords & throat • Derive/compute _____ parameters for a short interval and transmit to the decoder • Use the parameters to _______ speech at the decoder • So what is a good model? • A “buzzer” in a “tube”! • The buzzer is characterized by its _________ & _______ • The tube is characterized by its ___________s
Vocoding - Basic Concepts 75 60 45 30 15 0 • Formant — frequency maxima & minima in the spectrum of the speech signal • Vocoders code • _____ • Period • _________, and • signaling vocal tract _________ parameters • Voiced sounds, m,v,and l. • Unvoiced sounds, f and s. Amplitude Frequency (kHz)
p k=1 “Buzzer” and “Tube” Model • Vocoding principles: • voice = _________s + buzz ______ & intensity • voice – estimated ________s = “residue” “yadda yadda yadda” • Linear Predictive Coding (LPC) • A sample is represented as a linear combination of ___ previous ________s y(n) =aky(n – k) +Gxx(n)
LPC • Decoder artificially generates speech via _________ synthesis • A mathematical simulation of the _______ as a series of bandpass filters • Encoder codes & transmit filter _______, pitch period, gain factor, & nature of excitation
LPC Related Standards • Standards: • Regular Pulse Excited Linear Predictive Coder (RPE-LPC) • Digital cellular standard GSM 6.1 (___ kbps) • Code Excited Linear Predictive Coder (CELP) • US Federal Standard 1016 (_____ kbps) • Waveform template based to improve sound quality. • Linear Predictive Coder (LPC) • US Federal Standard 1015 (______ kbps) • Very synthetic and used primarily in military applications with very limited bandwidth.
Networking Concerns • Audio bandwidth is actually quite small. • But human sensitivity to loss and noise is quite ________. • Networking concerns: • _______ concealment • ________ control • Especially for telephony applications.