210 likes | 364 Views
HIWIRE MEETING Nancy, July 6-7, 2006. José C. Segura, Ángel de la Torre. Schedule. Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level
E N D
HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre
Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD
HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre
Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD
Non-linear Parametric Equalization • Feature normalization • Motivation of PEQ: • Limitation of linear methods: • Cepstral Mean Normalization • Cepstral Mean and Variance Normalization • Limitation of non-linear methods (HEQ, OSEQ): • Speech/non-speech ratio • Estimation problems • Parametric Equalization PEQ: • Two Gaussian Model (speech / non-speech) • Training of clean Gaussians; estimation of noisy Gaussians • Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)
Non-linear Parametric Equalization • Aurora-2 results: • Aurora-4 results:
Non-linear Parametric Equalization • Additional problem of non-linear transformations: • Once the transformation is estimated, it is an “instantaneous transformation” • Temporal correlations are not exploited • Temporal Smoothing (TES): • Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data
Non-linear Parametric Equalization TES • Aurora-2 results: • Aurora-4 results: TES
Model Based Feature Compensation (VTS) • VTS feature normalization: • Performed in log-FBE domain, (previous to DCT) • Based on a Gaussian mixture model trained with clean speech • Allows feature compensation and uncertainty estimation • Summary of VTS (vector Taylor series approach): • Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian • The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) • An estimation of the clean speech x is then possible • An estimation of the uncertainty is also possible
Model Based Feature Compensation (VTS) • Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:
Model Based Feature Compensation (VTS) • Step 2: Estimation of P(k|y): where: is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. • Step 3: Estimation of clean speech:
Model Based Feature Compensation (VTS) • Step 4: Estimation of uncertainty: assuming small values of the variance of the noise: and from the estimation of the clean speech: the uncertainty of the clean speech can be estimated as:
Model Based Feature Compensation (VTS) • Aurora-2 results: • Some considerations about VTS: • Computational load • Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion • Estimation of noise is critical • There are some approximations in the formulation • Uncertainty: small improvement (insert., substit., delet.) • Alternative: model-based compensation based on numerical integration of pdfs
Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD
Model-based VAD • Fundamentals of model-based VAD: • Gaussian mixture model in log-FBE domain • Gaussian mixture model trained with clean speech • VTS provides a noisy version of the GMM • From the noisy GMM, P(k|y) can be estimated for each observation yand each Gaussian k • A-priori probability of kth Gaussian being speech P(V|k) can be estimated from the training data • Then, the probability P(V|y) of the noisy observation y being speech is given by:
Model-based VAD • Some considerations about model-based VAD: • VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database) • Not based on energy.... • Based on observations in the log-FBE domain • VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs • Computational load
Model-based VAD • Model-based VAD for different SNRs:
Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2
Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2
Model-based VAD • Aurora-2 recognition results (WAcc): Baseline: 60.5 % (no VAD, no WF, no FD)
HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre