220 likes | 328 Views
PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST SPEECH RECOGNITION. Overview:. Introduction: Automatic speech recognition. Problem: Imperfect noise suppression. Proposed solution: Uncertainty propagation. Tests & results. Conclusions.
E N D
PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST SPEECH RECOGNITION Overview: Introduction: Automatic speech recognition. Problem: Imperfect noise suppression. Proposed solution: Uncertainty propagation. Tests & results. Conclusions. R. F. Astudillo, D. Kolossa and R. Orglmeister - TU-Berlin
Automatic Speech Recognizer (ASR) • Feature extraction transforms signal into a domain more suitable for recognition. • Speech recognizer models abstract speech components like phonemes or triphones, generates transcription. • Most of speech recognition applications need noise suppression preprocessing.
Feature Extraction • Non-linear transformations that imitate the way humans process speech. • Robust against inter-speaker and intra-speaker variability. • Mel-cepstral and RASTA-PLP transformations.
Speech Recognition Example: Mel-cepstral features • Statistical models are used to model speech. • Hidden Markov models with mixture of Gaussians (multivariable) for the emitting states.
Noise Suppression • Most methods obtain an estimation of the short-time spectrum (STFT) of the clean signal . • MMSE-LSA bayesian estimation [Ephraim1985] is one of the most used. • Leaves residual noise. • Introduces artifacts in speech. Problem: Imperfect estimation.
Solution: Modeling Uncertainty of Estimation We model each element of the STFT as a complex Gaussian random distribution . • Mean set equal to estimated clean value . • Parameter controls the • uncertainty.
Propagation of Uncertainty • We propagate first and second order moments of the distributions. • Correlation between feature appears (covariance). • Resulting uncertainty is combined with statistical model parameters for robust speech recognition
Propagation of Uncertainty • We propagate first and second order moments of the distributions. • Correlation between feature appears (covariance). • Resulting uncertainty is combined with statistical model parameters for robust speech recognition
Approaches to Uncertainty Propagation Analytic solutions Imply complex calculations. Specific for each transformation. Pseudo-MontecarloUnscented Transform [Julier1996]. Inefficient for high number of dimensions (i.e. STFT 256 dim./frame). ►Piecewise Propagation Efficientcombination of both methods. Valid for many feature extractions (i.e. MELSPEC, MFCC, RASTA-PLP).
Piecewise Uncertainty Propagation • Exemplified with Mel-Ceptral transformation: • Modulus extraction (non-linear). • Filterbank (linear). • Logarithm (non-linear). • Discrete-cosine-transform (linear). • Delta and acceleration coefficients (linear).
Propagation through Modulus • By integrating the phase of a complex Gaussian distribution we obtain the Rice distribution. • Mean and variance can be calculated as: • were L is the Legendre polynom.
Propagation through filterbank • Each filter output m is a weighted sum of frequency moduli. • It can be expressed as a matrix multiplication. • Mean and variance can be calculated as:
Full Covariance and other linear transformations • Covariance after filterbank • is no longer diagonal. • Additional computation costs. • DCT, delta and acceleration can be computed similarly.
Propagation through Logarithm • Non-linear transformation • Distribution after filterbank difficult to model • not diagonal • Dimesionality of the Mel features much smaller than the STFT features • ► Unscented transform can be applied efficiently
Unscented Transform • Only points must be propagated. • Points on the th covariace contour and the mean. • = feature dimension • Example for =2
Unscented Transform II • Mean and covariances are calculated by using weighted averages: • Parameter allows higher moments of the distribution to be considered.
Use of Uncertainty Parameters of state f1 • After propagation of uncertainty, missing feature techniques or uncertainty decoding may be applied. • These techniques combine uncertainty and model information to ignore or reestimate noisy features.
Use of Uncertainty II • Modified imputation [Kolossa2005] showed the best performance. • It reestimates features for state q by maximizing the probability: • Assuming multivariate Gaussian distribution for uncertainty • and model:
Recognition Tests TI-DIGITS database • 200 files (20 different speakers). • Best, second best results. 0 0 0
Conclusions • The use of uncertainty in Mel-cepstral domain is useful to compensateimperfect estimationduring noise suppression. • Piecewise uncertainty propagation is valid for multiple feature extractions. • Better estimation of uncertainty should improve the results.
Thank You! Some literature: [Ephraim1985] Y. Ephraim, and D. Malah, Acoustics, Speech, and Signal Processing, IEEE Transactions on 33, 443–445 (1985). [Julier1996] S. Julier, and J. Uhlmann, A general method for approximating nonlinear transformations of probability distributions, Tech. rep., University of Oxford, UK (1996). [Kolossa2005] D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on, 2005, pp. 82-85.