420 likes | 1.03k Views
Linear Predictive Coding for Speech Compression . Dev Ghosh ECE 463. 9 March 2006. Overview. General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction (CELP) Novel Application Sub-band adaptive filtering based on cochlear model.
E N D
Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006
Overview • General Model for Speech Synthesis • Channel Vocoder • Linear Predictive Coder (LPC-10) • Code Excited Linear Prediction (CELP) • Novel Application • Sub-band adaptive filtering based on cochlear model
Model for Speech Synthesis • Speech produced by forcing air through vocal cords, larynx, pharynx, mouth and nose • At transmitter speech is divided into segments • Each segment analyzed to determine excitation signal and parameters of vocal tract filter Excitation Source Vocal tract filter Speech
Channel Vocoder - analysis • Each segment of input speech analyzed by a bank of (bandpass) analysis filters • Energy at output of each filter is estimated 50 times a second and transmitted to receiver • Decision made whether segment • voiced /a/, /e/, /o/ or • unvoiced /s/, /f/ • Estimate of pitch period (period of fundamental harmonic) is determined
Channel vocoder - synthesis • Vocal tract filter implemented by bank of (bandpass) synthesis filters • For voiced segments, periodic pulse generator is input • For unvoiced segments, pseudonoise source is input • Period determined by pitch estimate • Scaled by output of energy estimate • First approach to speech compression
Linear Predictive Coder • Models vocal tract as a single linear filter yn = ∑aiyn-i+Gn • Output: yn, Input: n, Gain: G • Input is random noise (unvoiced) or periodic pulse (voiced) • LPC-10 is a standard (2.4 kb, 8000 Samples/sec)
LPC - Voiced/Unvoiced Decision • Voiced speech has more energy and lower frequency than unvoiced • Speech segment lowpass filtered, energy at output relative to background noise used to determine • Zero-crossings counted to determine frequency • Continuity critereon: voicing decision of neighboring frames taken into account
LPC - Estimating Pitch Period • Extracting pitch from short noisy segment is difficult • One approach is to maximize autocorrelation • Periodicity isn’t strong enough • Threshold can’t be used because maximum value not known in advance
LPC - Estimating Pitch Period • LPC-10 uses average magnitude difference function (AMDF) AMDF(P) =(1/N)∑|yi-yi-P| • If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0 • AMDF is periodic for voiced and roughly flat for unvoiced • AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow
LPC - Obtaining Vocal Tract Filter • At transmitter, we want filter coeffs that best match the segment in a mean squared error en2=(yn- ∑aiyn-i+Gn)2 • Autocorrelation approach assumes {yn} is stationary A = R-1P • Recursive solution uses Levinson-Durbin
LPC - Obtaining the Vocal Tract Filter • Covariance approach discards stationarity assumption (not valid for speech signals) cij =E[yn-iyn-j] yields CA = S
LPC - Obtaining the Vocal Tract Filter • cij are estimated as cij = ∑yn-iyn-j • No longer assume values of yn outside of segment are zero • Cholesky decomposition required • Reflection coeffs used to update voicing decision
LPC - Transmitting Parameters • Tenth order filter used for voiced speech and fourth order for unvoiced • Vocal tract filter is sensitive to errors in reflection coeffs close to one gi = (1+ki)/(1-ki) are quantized and sent instead of ki
Code Excited Linear Prediction • Single pulse per pitch period leads to buzzy twang • Variety of excitation signals is allowed • For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded
Sub-band adaptive filtering • Multi-channel speech enhancement system • Greater number of sub-bands used, the faster the convergence of the overall system
Cochlear Modelling • Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea
Adaptive Noise Cancellation • LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands • Lower power and shorter filters used in each sub-band • Convergence is equal across all bands if power is distributed equally and filter lengths are the same • Convergence dominated by sub-band with greatest power