400 likes | 482 Views
Speech Recognition in Adverse Environments. Juan Arturo Nolazco-Flores Dpto. de Ciencias Computacinales ITESM, campus Monterrey. Talk Overview. Introduction Parallel Model Combination(PMC) SS-PMC Coments and Conclusions. END!. Introduction. Problem:
E N D
Speech Recognition in Adverse Environments Juan Arturo Nolazco-Flores Dpto. de Ciencias Computacinales ITESM, campus Monterrey
Talk Overview • Introduction • Parallel Model Combination(PMC) • SS-PMC • Coments and Conclusions
Introduction • Problem: • Automatic Speech Recognition performance is highly degraded when speech is corrupted for noise (additive noise, convolutional noise, etc.). • Fact: • In order to have real speech recognisers ASR should tackle this problem. • Knowledge. • ASR can be improved either: • Enhancing speech before recognition • Training models in the same environment the ASR is going to be used.
Input Data It needs a model for unit of recognition. M1 M2 Probability of each model. MQ Higher Probability Recognised word Recognition using CD-HMM Recogniser
Enhancing Speech • Features: • Models are trained with clean speech. • Corrupted speech is enhanced. • There are a number of well studied techniques: • Subtract an estimated noise found during nonspeech activity. • Adaptive noise cancelling (ANC). • Successful for low to medium SNR (>0).
Problems: • Enhancers are not perfects, therefore • the speech is distorted and • there are residual noise.
Training models in the same environment • ASR systems which uses this technique can deal with low to high SNR (>0 dB). • In example, for an isolated digit recognition task where digits are corrupted for helicopter(Lynx) noise, you can get the following performance: • For TIMIT • Problem: • There are many possible environments (no practical).
However, using continuous HMM is possible to combine the clean speech model and noise model and obtain a noisy speech model. • Techniques: • Model Decomposition • Parallel Model Combination
Parallel Model Combination (PMC) • Introduction • Scheme • Diagram
Introduction • It is an artificial way to simulate that the system has been trained in the adverse environment the system is going to work. • The clean speech CHMM and the noise CHMM (estimated with the noise before the word is uttered) are combined to obtain models adapted to the adverse environment. • The combination is based in the assumption that that pdf of the state distribution models are completely defined by the mean and variance.
Scheme • For simplicity, it is convenient to combine these models in a linear domain. • Problem: • High performance speech recognition is obtained in a non-linear domain (i.e. mel-cepstral domain). • Solution: • Transform coefficients to a linear domain.
Diagram Clean speech HMM Linear domain C->log exp() PMC HMM C() + log() Noise HMM C->log exp() Simulates training in noise.
SS-PMC • Introduction • Hypothesis prove • SS Combination Development • Diagram • Results
Introduction • How can we improve recognition performance in highly adverse environments (SNR<0dB)? • Thus, PMC does not represent a solution for highly adverse environments. (Upper boundary conditions)
On the other hand, we know that the enhancer returns a “cleaner” speech, but distorted. • Therefore the question is: • Is it possible to improve recognition performance if the models where trained with this “cleaner” speech?
Hypothesis • Training HMMs with enhanced speech makes the HMM learn both the speech distortion and the residual noise. • If we show that this hypothesis is true, we can be confident that indeed we can improve recognition performance.
In order to prove this hypothesis: • An enhancer scheme was selected. • Models were trained with the enhanced speech. • Recognition performance was developed in the same conditions. • The recognition performance obtained for this experiment will be compared with the recognition performance obtained when models were trained in the same environment.
Hypothesis Prove • Introduction • Spectral Subtraction definition • Experiments and results • Conclusions
Introduction • Since it is a simple (and successful) scheme, Spectral Subtraction (SS) was selected.
Spectral Subtraction Definition • Before filterbank • After filterbank.
Experiments and Results. • CHMMs were trained speech enhanced by SS. • Recognition performance was developed over speech enhance by SS in the same conditions.
Example 1 • Task: isolated digit Recognition • Training: Using enhanced speech • Noise: Helicopter • Database: Noisex92 • Real noise is artificially added to clean speech, such that no Lombard effect can bias recognition performance.
Results • bMSS: Training Models in Noise (PMC) This values represent the upper boundary of the ASR system.
bPSS Training Models in Noise (PMC)
Example 2: • Vocabulario: 30 palabras (números: I.e. dos mil quinientos dólares).
Example 3: • TIMIT
Conclusions • Hypothesis was prove to be true. • A new research area is open • Tried these experiments using other databases. • How can we combine CHMM, such that we do not need to train for all enhancement conditions. • Are all the enhancement technique suited for CHMM combination?
Now, we know that ASR can be improved either: • Enhancing speech before recognition • Training CHMM in the same environment the ASR is going to be used. • Training CHMM with the same enhancement technique that is used to get “cleaner” speech at recognition. • Advantage: • Moreover, training with a better enhancement technique means a potential better recognition performance.
SS Model Combination • Introduction • Spectral Subtraction Scheme
Introduction • It was proven, when training and testing CHMMs using the same enhancement condition the recognition performance is improved. • How can we combine CHMMs without having to train for each enhancement and noise condition? • Observation: For CHMMs the state’s pdfs are completelydefined for their means and variances.
Spectral Subtraction Scheme Assuming Y and YD can be modelled as parametric distributions with means E[Y] and E[YD] and variances V[Y] and V[YD]. It can be shown that these parameters are distorted as follows: pdf of Y
Prove: where Re-arranging
A(a,P(Y)) Assuming that Y is lognormal: Making ( )
Diagram Adaptation calculations Clean speech HMM SS-PMC HMM C->log exp() C() log() + + PMC Noise HMM C->log exp() Speech is pre-processed using SS.
Results No compensation scheme Spectral Subtraction PMC Spectral Subtraction and parallel model combination
Coments and Conclusions • Since training and recognition with the same speech enhancement scheme have not been tried before, hence a new area of research is now open. • How can we combine CHMM, such that we do not need to train for all enhancement conditions. • Are all the enhancement technique suited for CHMM combination? • We show how to combine clean speech and noise CHMM for SS scheme. • It was shown that equations for CHMM combination, when SS scheme is used, were straightforward.
We expect that training with a better enhancement technique we can also obtain better recognition performance. • Future work: • Develop equations and experiments for other enhancement techniques. • Obtain the optimal alpha for SS scheme.