380 likes | 613 Views
HIWIRE MEETING Athens, November 3-4, 2005. José C. Segura, Ángel de la Torre. Schedule. HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation
E N D
HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre
Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise
HIWIRE database evaluations • PARAMETERS: MFCC_0_D_A_Z (39 component) • MODELS: • TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) • WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) • WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G) • ADAPTATION: • MLLR: 32 regression classes / 50 adaptation utterances • GRAMMAR: • LORIA & Word-Loop • MODIFICATIONS: Some transcriptions have been modified to match the grammar definition
Transcription modifications BEGIN { lista = LISTA; nfrase = 0; } { linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1; }
RESULTS WITHOUT ADAPTATION (WER) RESULTS WITH MLLR (WER) HIWIRE database results
Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise
ECDF segmental implementation • ECDF segmental implementation • Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer • Current work • Nonlinear feature transformation with a clean reference to avoid the problem of system retraining
Parametric Equalization (1) PARAMETRIC NONLINEAR FEATURE EQUALIZATION FOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06) • HEQ limitations • Influence of relative amount of silence in utterances • With a parametric model, a more robust equalization can be obtained
CLASS-DEPENDENT LINEAR EQUALIZATION SOFT DECISSION VAD (two-class Gaussian classifier on C0)NONLINEAR INTERPOLATION Parametric Equalization (2)
Parametric Equalization (4) • In comparison with HEQ, PEQ transformations are smoother • For C0 a monotonic transformation is obtained • For other coefficients, the interpolated transformation is not monotonic
Parametric Equalization (5) • BASE • MFCC_0_D_A_Z (39 component) • HEQ • Quantile based CDF-transformation • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after HEQ • AFE • Standard implementation • PEQ • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after PEQ
Parametric Equalization (6) • Current work • Development of an on-line version • Relax the diagonal covariance assumption • Investigate the normalization of dynamic features • Using a more detailed model of speech frames • (i.e. More than one Gaussian)
Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise
Bispectrum-based VAD (1) • Motivations: • Ability of higher order statistics to detect signals in noise • Polyspectra methods rely on an a priori knowledge of the input processes • Issues to be addressed: • Computationally expensive • Variance of the bispectrum estimators is much higher than that of power spectral estimators for identical data record size • Solution: Integrated bispectrum • J. K. Tugnait, “Detection of non-Gaussian signals using integrated polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994. • Computationally efficient and reduced variance statistical test based on the integrated polyspectra • Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise
Bispectrum-based VAD (2) • Integrated bispectrum: • Defined as a cross spectrum between the signal and its square, and therefore, it is a function of a single frequency variable • Benefits: • Its computation as a cross spectrum leads to significant computational savings • The variance of the estimator is of the same order as that of the power spectrum estimator • Properties • Bispectrum of a Gaussian process is identically zero, its integrated bispectrum is as well
Bispectrum-based VAD (3) • Two alternatives explored for formulating the decision rule: • Estimation by block averaging: • MO-LRT • Given a set of N= 2m+1 consecutive observations:
Likelihoods Bispectrum-based VAD (4) • Variances
Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise
Schedule • Model-based feature compensation • VTS: results on AURORA4 • VTS formulation • VTS vs non linear feature normalization procedures • VTS results on AURORA 4 • Including uncertainty caused by noise • Including uncertainty in noise compensation • Wiener filtering + uncertainty: results on Aurora 2 • Wiener filtering + uncertainty: results on Aurora 4 • VTS + uncertainty: formulation • Numerical integration of probabilities: formulation
VTS formulation • VTS: Vector Taylor Series approach to remove additive (and channel) noise • References: • P.J. Moreno. “Speech recognition in noisy environments” Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996. • A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.
VTS formulation • VTS provides an estimation of the clean speech in a statistical framework: • Log-FBO domain, assumed additive noise: • Effect of noise described using the “correction function” g():
VTS formulation • Auxiliary functions f() and h(): 1st and 2nd derivatives: • VTS provides estimation of noisy-speech Gaussian given the clean-speech and the noise Gaussians: • Noisy-speech Gaussian obtained with the expected values:
VTS formulation • Noisy-speech Gaussian: formulas: • Models for noise and clean speech:
VTS formulation • Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian): • Estimation of clean speech:
VTS vs non-linear feature normalization • VTS: • Statistical framework: • Model for noise in log-FBO domain: 1 Gaussian PDF • Model for clean-speech in log-FBO domain: Gaussian mixture • Noise assumed to be additive in FBO domain • Accurate description of noise process ACCURATE COMPENSATION • Non-linear feature normalization: • No a-priori assumption • Component-by-component MORE FLEXIBLE, LESS ACCURATE
Including uncertainty in noise compensation • Noise is a random process: we do not know n, but p(n) • Then, from an observation y we cannot find x, but p(x|y,x,n) • Usually, compensation procedures provide E[x|y,x,n] • What about uncertainty of x ? • Mean and variance of x :
Including uncertainty in noise compensation • An approach for the estimation of the variance: • Evaluation of HMM Gaussians:
Wiener filt. + uncertainty: results on AURORA 2 • Preliminary results with Wiener filtering: • Results on Aurora 2 with Wiener filtering + uncertainty
VTS + uncertainty: formulation • VTS based estimation of clean speech: • VTS based estimation of variance:
Numerical integration of probabilities: formulation • Computation of expected values: • Numerical integration of expected values:
HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre