810 likes | 1.01k Views
Speech Recognition Chapter 3. Speech Front-Ends. Linear Prediction Analysis Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing. Linear Prediction Analysis. Introduction Linear Prediction Model Linear Prediction Coefficients Computation
E N D
Speech Front-Ends • Linear Prediction Analysis • Linear-Prediction Based Processing • Cepstral Analysis • Auditory signal Processing
Linear Prediction Analysis • Introduction • Linear Prediction Model • Linear Prediction Coefficients Computation • Linear Prediction for Automatic Speech Recognition • Linear Prediction in Speech Processing • How good is the LP Model.
Signal Processing Front End Convert the speech waveform in some type of parametric representation. sk Filterbank Signal Processing Front End Linear Prediction Front End Linear Prediction Coefficients O=o(1)o(2)..o(T)
Introduction • In short intervals, it provides a good model of the speech. • Mathematical precise and simple. • Easy to implement in software or hardware. • Works fine for recognition applications. • It also has applications in formant and pitch estimation, speech coding and synthesis.
Linear Prediction Model • Basic idea: • are called LP(Linear Prediction) coefficients. • By including the excitation signal, we obtain: • where is the normalised excitation and is the gain of the excitation.
In the z-domain (secc. 1.1.4, pp. 15, Deller) • leading to the transfer function (Fig. 3.27)
LP model retains the spectral magnitude, but it has a minimum phase (Sec. 1.1.7, Deller) feature. • However, in practice, phase is not very important for speech perception. Observation: H(z) models the glottal filter(G(z)) and the lips radiation(R(z).
Linear Prediction Coefficients Computation • Introduction • Methogologies
Linear Prediction Coefficients Computation • LP coefficients can be obtained by solving the next equation system (Secc. 3.3.2, Prove ):
Methodologies • Autocorrelation Method • Covariance Method • Not commonly usedin Speech Recognition
Autocorrelation Method • Assumptions: Each frame is independent (Fig. 3.29 ). • Solution (Juang, secc. 3.3.3 pp105-106): where (2) These equations are know as Yule-Walker equations.
Features Symetric. Diagonal elements are the same. Toeplitz Matriz
This matrix is known as Toeplitz. A linear system with this matrix can be solved very efficient. • Examples (Fig. 3.32 and 3.33 ) • Example (Fig. 3.34 ) • Example (Fig. 3.35 ) • Example (Fig. 3.36 )
Linear Prediction for Automatic Speech Recogition To minimise signal discontinuity Flats the spectrum equation (2) usually M=8 Incorporate signal dynamics to minimise noisesensitivity To Cepstral Coefficients Durbin Algorithm
Preemphasis • The transfer function of the glottis can be modelled as follows: • The radiation effect can be modelled as follows:
Hence, to obtain the transfer function of the vocal tract the other pole must be cancelled as follows:.
Preemphasis sould be done only for sonorant sounds. This process can be automated as follows. where is the autocorrelation function.
N samples size frame, M samples frame shift N samples size frame, M samples frame shift
Minimize signal discontinuities at the edges of the frames. • A typical window is the Hamming window.
LPC Analysis • Converts the autocorrelations coefficients into LPC “parameter set”. • LPC Parameter set • LPC coefficients • Reflection (PARCOR) coefficients • log area ratio coefficients • The formal method to obtain the LPC parameter set is know as Durbin’s method.
LPC Parameter Conversion • Conversion to Cepstral Coeficients. • Robust feature set for speech recognition. • Algorithm:
Parameter weighting • low-order cepstral coefficents are highly sensibles to noise
Temporal Cepstral Derivative • First or second order derivatives is enough. • It can be aproximated as follows:
Hamming Windowed Large prediction errors since speech is predicted form previous samples arbitray set to zero.
Large prediction errors since speech is predicted form previous samples arbitray set to zero.
Unvoiced signals are not position sensitive. It does not show special effect at the edges.
Observe the “whitening” phenomena at the error spectrum.
Observe the “whitening phenomena at the error specturm
Observe the error • wave periodicity • behaviour taken • as bases for the • Pitch Estimators.
Observe that a sharp decrease • in the prediction error is obtain • for small M value (M=1...4). • Observe that unvoiced signal • has higher RMS error.
Observe the all-pole model • ability to match the spectrum.
Linear Prediction in Speech Processing • LPC for Vocal Tract Shape Estimation • LPC for Pitch Detection • LPC for Formant prediction
LPC for Vocal Tract Shape Estimation To minimise signal discontinuity Free of glottis and radiation effects Vocal Tract Shape Estimation Parameter Calculation to minimise noisesensitivity To Cepstral Coefficients
Parameter Calculation • Durbin’s Method (As in Speech Recognition) • In case, this method is used, first the autocorrelation analysis should be performed. • Lattice Filter
Lattice Filter • The reflection coefficients are obtain directly form the signal, avoiding the autocorrelation analysis. • Methods: • Itakura-Saito (Parcor) • Burg • New forms • Advantage: • Easier to implement in Hardware • Disadvantage: • needs around 5 times more calculation.
Itakura-Saito (PARCOR) where Accumulates over time (n). It can be shown that the PARCOR coefficients, obtain for the Itakura-Saito method are exactly the same as the reflection coefficients obtained by the Levison Durbin algorithm. Example
Burg where Example
Example Itakura-Saito Burg
New Forms • Stroback, New forms of Levinson and Schur algorithms, IEEE Signal Processing Magazine, pp. 12-36, 1991.
Vocal Tract Shape Estimation From: We obtain Therefore, by setting the the lips area to an arbitrary value we can obtain the vocal tract configuration relative to the initial condition. This technique as been succesfully used to train deaf persons.
LPC for Pitch Detection Speech Sampled at 10KHz Inverse Filering A(z) LPF 800Hz DownSampler 5:1 Peak finding Autocorrelation LPC Analysis V/U decision or Pitch