170 likes | 491 Views
Robust Speech recognition. V. Barreaud LORIA. Mismatch Between Training and Testing. mismatch influences scores causes of mismatch Speech Variation Inter-Speaker Variation. Robust Approaches. three categories noise resistant features (Speech var.)
E N D
Robust Speech recognition V. Barreaud LORIA
Mismatch Between Training and Testing • mismatch influences scores • causes of mismatch • Speech Variation • Inter-Speaker Variation
Robust Approaches • three categories • noise resistant features (Speech var.) • speech enhancement (Speech var. + Inter-speaker var.) • model adaptation for noise (Speech var. + Inter-speaker var.) Recognition system Models training Features encoding testing Spk. B Word sequence Spk. A
Contents • Overview • Noise resistant features • Speach enhancement • Model adaptation • Stochastic Matching • Our current work
Noise resistant features • Acoustic representation • Emphasis on less affected evidences • Auditory systems inspired models • Filter banks, Loudness curve, Lateral inhibition • Slow variation removal • Cepstrum Mean Normalization, Time derivatives • Linear Discriminative Analysis • Searches for the best parameterization
Speech enhancement • Parameter mapping • stereo data • observation subspace • Bayesian estimation • stochastic modelization of speech and noise • Template based estimation • restriction to a subspace • output is noise free • various templates and combination methods • Spectral Subtraction • noise and speech uncorrelated • slowly varying noise
Model Adaptation for noise • Decomposition of HMM or PMC • Viterbi algorithm searches in a NxM state HMM • Noise and speech simultaneously recognized • complex noises recognized • State dependant Wiener filtering • Wiener filtering in spectral domain faces non-stationary • Hmms divide speech in quasi-stationary segments • wiener filters specific to the state • Discriminative training • Classical technique trains models independently • error corrective training • minimum classification error training • Training data contamination • training set corrupted with noisy speech • depends on the test environment • lower discriminative scores Training
Stochastic Matching : Introduction • General framework • in feature space • in model space
W n Y G Stochastic Matching : General framework • HMM Models X, X training space • Y ={y1, …, yt}observation in testing space • and
Stochastic Matching : In Feature Space • Estimation step : Auxiliary function • Maximization step
Stochastic Matching : In Feature Space (2) • Simple distorsion function • Computation of the simple bias
Stochastic Matching : In Model Space • random additive bias sequence B={b1,…,bt} independent of speech stochastic process of mean b and diagonal covariance b
On-Line Frame-Synchronous Noise Compensation • Lies on stochastic matching method • Transformation parameter estimated along with optimal path. • Uses forward probabilities Bias computation b1 b2 b3 b4 reco reco reco Transformed observations z2 z3 z4 z5 y4 y2 y3 Sequence of observations
Theoretical framework and issue • On line frame synchronous • cascade of errors • Classical Stochastic Matching 1. Initiate bias of first frame b0=0 2. Compute and then b 3. Transform next frame with b 4. Goto next frame
Viterbi Hypothesis vs Linear Combination • Viterbi Hypothesis take into account only the « most probable » state and gaussian component. • Linear combination states t t+1
Experiments • Phone numbers in a running car • Forced Align • transcription + optimum path • Free Align • optimum path • Wild Align • no data
Perspectives • Error recovery problem • a forgetting process • a model of distorsion function • environmental clues • More elaborated transform