HIWIRE MEETING Athens, November 3-4, 2005

HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre

Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

HIWIRE database evaluations • PARAMETERS: MFCC_0_D_A_Z (39 component) • MODELS: • TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) • WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) • WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G) • ADAPTATION: • MLLR: 32 regression classes / 50 adaptation utterances • GRAMMAR: • LORIA & Word-Loop • MODIFICATIONS: Some transcriptions have been modified to match the grammar definition

Transcription modifications BEGIN { lista = LISTA; nfrase = 0; } { linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1; }

RESULTS WITHOUT ADAPTATION (WER) RESULTS WITH MLLR (WER) HIWIRE database results

Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

ECDF segmental implementation • ECDF segmental implementation • Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer • Current work • Nonlinear feature transformation with a clean reference to avoid the problem of system retraining

Parametric Equalization (1) PARAMETRIC NONLINEAR FEATURE EQUALIZATION FOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06) • HEQ limitations • Influence of relative amount of silence in utterances • With a parametric model, a more robust equalization can be obtained

CLASS-DEPENDENT LINEAR EQUALIZATION SOFT DECISSION VAD (two-class Gaussian classifier on C0)NONLINEAR INTERPOLATION Parametric Equalization (2)

Parametric Equalization (3)

Parametric Equalization (4) • In comparison with HEQ, PEQ transformations are smoother • For C0 a monotonic transformation is obtained • For other coefficients, the interpolated transformation is not monotonic

Parametric Equalization (5) • BASE • MFCC_0_D_A_Z (39 component) • HEQ • Quantile based CDF-transformation • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after HEQ • AFE • Standard implementation • PEQ • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after PEQ

Parametric Equalization (6) • Current work • Development of an on-line version • Relax the diagonal covariance assumption • Investigate the normalization of dynamic features • Using a more detailed model of speech frames • (i.e. More than one Gaussian)

Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

Bispectrum-based VAD (1) • Motivations: • Ability of higher order statistics to detect signals in noise • Polyspectra methods rely on an a priori knowledge of the input processes • Issues to be addressed: • Computationally expensive • Variance of the bispectrum estimators is much higher than that of power spectral estimators for identical data record size • Solution: Integrated bispectrum • J. K. Tugnait, “Detection of non-Gaussian signals using integrated polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994. • Computationally efficient and reduced variance statistical test based on the integrated polyspectra • Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise

Bispectrum-based VAD (2) • Integrated bispectrum: • Defined as a cross spectrum between the signal and its square, and therefore, it is a function of a single frequency variable • Benefits: • Its computation as a cross spectrum leads to significant computational savings • The variance of the estimator is of the same order as that of the power spectrum estimator • Properties • Bispectrum of a Gaussian process is identically zero, its integrated bispectrum is as well

Bispectrum-based VAD (3) • Two alternatives explored for formulating the decision rule: • Estimation by block averaging: • MO-LRT • Given a set of N= 2m+1 consecutive observations:

Likelihoods Bispectrum-based VAD (4) • Variances

Bispectrum-based VAD results (1)

Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

Schedule • Model-based feature compensation • VTS: results on AURORA4 • VTS formulation • VTS vs non linear feature normalization procedures • VTS results on AURORA 4 • Including uncertainty caused by noise • Including uncertainty in noise compensation • Wiener filtering + uncertainty: results on Aurora 2 • Wiener filtering + uncertainty: results on Aurora 4 • VTS + uncertainty: formulation • Numerical integration of probabilities: formulation

VTS formulation • VTS: Vector Taylor Series approach to remove additive (and channel) noise • References: • P.J. Moreno. “Speech recognition in noisy environments” Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996. • A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.

VTS formulation • VTS provides an estimation of the clean speech in a statistical framework: • Log-FBO domain, assumed additive noise: • Effect of noise described using the “correction function” g():

VTS formulation • Auxiliary functions f() and h(): 1st and 2nd derivatives: • VTS provides estimation of noisy-speech Gaussian given the clean-speech and the noise Gaussians: • Noisy-speech Gaussian obtained with the expected values:

VTS formulation • Noisy-speech Gaussian: formulas: • Models for noise and clean speech:

VTS formulation • Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian): • Estimation of clean speech:

VTS vs non-linear feature normalization • VTS: • Statistical framework: • Model for noise in log-FBO domain: 1 Gaussian PDF • Model for clean-speech in log-FBO domain: Gaussian mixture • Noise assumed to be additive in FBO domain • Accurate description of noise process ACCURATE COMPENSATION • Non-linear feature normalization: • No a-priori assumption • Component-by-component MORE FLEXIBLE, LESS ACCURATE

VTS results on AURORA 4

Including uncertainty in noise compensation • Noise is a random process: we do not know n, but p(n) • Then, from an observation y we cannot find x, but p(x|y,x,n) • Usually, compensation procedures provide E[x|y,x,n] • What about uncertainty of x ? • Mean and variance of x :

Including uncertainty in noise compensation

Including uncertainty in noise compensation • An approach for the estimation of the variance: • Evaluation of HMM Gaussians:

Wiener filt. + uncertainty: results on AURORA 2 • Preliminary results with Wiener filtering: • Results on Aurora 2 with Wiener filtering + uncertainty

Wiener filter + uncertainty: results on AURORA 4

VTS + uncertainty: formulation • VTS based estimation of clean speech: • VTS based estimation of variance:

Numerical integration of probabilities: formulation • Computation of expected values: • Numerical integration of expected values:

HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre

HIWIRE MEETING Athens, November 3-4, 2005

HIWIRE MEETING Athens, November 3-4, 2005

Presentation Transcript

November 3, 2005

Draft Guidebook November 4, 2005

HIWIRE MEETING Paris, February 11, 2005

GIE Annual Conference, Athens, 3 November 2005

Kick off Meeting Athens, 2 nd – 3 rd November 2006

ONEBAT Meeting November 17, 2005

Facilities Subcommittee Meeting of November 2-3, 2005

November 1, 2005 Sales Meeting

BSN meeting Oslo, November 2005

HIWIRE MEETING Granada, June 9-10, 2005

ICTJ/DCAF 3 November 2005

Meeting in Athens

November 3, 2005

HIWIRE PRESENTATION

November 3, 2005

ATHENS EFIMAS MEETING

Working Group Meeting November 22, 2005

Athens Project Meeting

GIE Annual Conference 3 November 2005, Athens

November 4, 2005

Facilities Subcommittee Meeting of November 2-3, 2005

ACHIEVING A COMPETITIVE EUROPEAN GAS MARKET GIE annual conference Athens 3-4 November 2005