HIWIRE MEETING Granada, June 9-10, 2005

GSTC UGR HIWIRE MEETINGGranada, June 9-10, 2005 JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ

Schedule • Non-linear feature normalization • ECDF segmental implementation • Progressive equalization • 2-class normalization • Non-linear speaker adaptation/independence • Non-linear feature normalization • Non-linear model adaptation • VAD and technique combination • MO-LRT • Bi-spectrum based VAD • Combined Front-End

ECDF-based nonlinear transformation (1) • CDF-matching nonlinear transformation • In previous works we modeled CDF’s by using histograms

ECDF-based nonlinear transformation (2) • An alternative algorithm based on Order Statistics • Is faster, only requires sorting and table indexing • Results are almost equal to those obtained with histograms

ECDF Segmental implementation • Based on a sliding window • José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio, Javier Ramírez, Cepstral domain segmental nonlinear feature transformations for robust speech recognition, IEEE Signal Processing Letters.,Vol.11, pp. 666-669, 2004

Progressive normalization • As not all MFCC offer equal discrimination • And HEQ introduces certain distortion • Normalization up to a certain MFCC gives the best performance

ECDF-based normalization results

Test01 Test02 C0 C1 2-class normalization (1) • A first approach on parametric non-linear equalization • PDF’s are modeled as two-Gaussian class mixtures for each MFCC • Actually we use speech/noise like classes • EM is used on each sentence to obtain the Gaussian classes

Equalization of C1 between Test02(Car) and Test01(Clean) of WSJ0 data 2-class normalization (2) Nonlinear parametric transformation

2-class normalization results

ECDF Features Normalization • HEQ as a non-linear speaker normalization technique using ECDF

ECDF Norm. for SA

ECDF Models Adaptation 2 APPROACHES • Pure Equalization: “HEQ MOD” new Gaussian Distributions: - shift on the means: X ->X HEQ - scale factor on the variances • Equalization mixed with linear transformation: “HEQ PLIN” LT: XA = M*X + B M’, B’ such that D(XA, XHEQ) = || M’X+B’ - XHEQ || 2 = minimum Speaker Specific Features Speaker Independent Features

Models Adaptation

SA methods. Comparison

Future Work 1/2 • SA models using MLLR are not robust against noise Feature Normalization + MLLR

Future Work 2/2 • Non linear Feature Normalization and Model Adaptation Development of further experiments with more complex tasks on WSJ1 database (spoke3 and spoke4)

Previous work on VAD • Voice activity detection: • Kullback-Leibler divergence • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “A New Kullback-Leibler VAD for Robust Speech Recognition”, IEEE Signal Processing Letters, Vol.11, No.2, pp. 666-669, Feb. 2004 • Long-term spectral divergence • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information”, Speech Communication, Vol. 42/3-4, pp. 271-287, 2004 • Subband SNR estimation using OS filters • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition”, To appear in IEEE Transactions on Speech and Audio Processing, 2005/2006. • Multiple observation likelihood ratio test • J. Ramírez, J. C. Segura, C. Benítez, L. García, A. Rubio, “Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test”, To appear in IEEE Signal Processing Letters

Likelihood ratio test • Generalization of the Sohn’s VAD: • J. Sohn, N. S. Kim, W. Sung, “A statistical model-based voice activity detection”, IEEE Signal Processing Letters, vol. 16 (1), pp. 1-3, 1999. • Two hypothesis are considered: • H0 : y= nAbsence of speech (Silence) • H1 : y= s + nSpeech presence • Optimum decision rule (Bayes classifier): • l-frame observation vector: • LRT evaluation  Adequate signal model LRT: Likelihood ratio test

Multiple observation likelihood ratio test • MO-LRT (multiple observation LRT): • Given a set of N= 2m+1 consecutive observations: • LRT: • Under statistical independence: • Recursive Log-LRT:

m Analysis: Optimum delay Probability distributions Classification errors • Increasing m (number of the observations): • Reduction of the overlap between the distributions • Misclassification errors: Reduced for speech vs Moderate increase for non-speech

MO-LRT Sohn’s VAD Analysis: Optimum delay • ROC analysisAURORA 3 Spanish (High-Ch1, 5dB)

Speech recognition experiments Frame dropping (FD) Wiener Filtering (WF) MFCC HTK Noise estimation VAD AURORA 2: Average Wacc (%) for CT and MCT

Speech recognition experiments AURORA 3: Spanish SpeechDat-Car

Work in progress • Statistical tests in the bispectrum domain: • J. M. Górriz, et al., “Voice Activity Detection Based on HOS”, 8th International Work-Conference on Artificial Neural Networks (IWANN'2005) • J. M. Górriz, et al., “Statistical Tests for Voice Activity Detection”, Non-linear Speech Processing (NOLISP’2005), 2005. • J. M. Górriz, et al., “Bispectra analysis-based VAD for robust speech recognition”, First International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC’2005) • Bispectrum LRT (application of MO-LRT on the bispectra) • J. M. Górriz, et al, “An Improved MO-LRT VAD Based on a Bispectra Gaussian Model”, Submitted to Electronics Letters.

Segmental ECDF (Gaussian ref.) Progressive Noise reduction Frame dropping HTK LTSE VAD GSTC-UGR speech recognition results • LTSE VAD: • J. Ramírez, et al., “Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information”, Speech Communication, Vol. 42/3-4, pp. 271-287, 2004 • Segmental ECDF: 60 frame delay • J. C. Segura, et al., “Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition”, IEEE Signal Processing Letters, Vol.11, No. 5, pp. 517 - 520, 2004 • Progressive: • Log-E + Up to the 4th cepstral coefficient

GSTC-UGR speech recognition results WER Relative Improvements: 12% (MCT) 59% (CT) WER Relative Improvements: 60% (WM) 46% (MM) 73% (HM)

GSTC-UGR speech recognition results AURORA 4 WER (%) (clean training experiments) WER Relative Improvements: 20% (Test sets 1:7) 17% (Test sets 8:14)

GSTC UGR HIWIRE MEETINGGranada, June 9-10, 2005 JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ

HIWIRE MEETING Granada, June 9-10, 2005

HIWIRE MEETING Granada, June 9-10, 2005

Presentation Transcript

First Annual Meeting Kingston, Canada June 6 - 9, 2005

Introduction (business meeting) Second Consortium Meeting, 9-11 June 2005, Helsinki

8TH ANNUAL AIJA TRIBUNALS CONFERENCE Sydney, 9-10 June 2005

Kick-off Meeting Rabat, Morocco 9 - 10 June 2013

AHRQ Annual Meeting June 9, 2005

June 9, 2005

HIWIRE MEETING Paris, February 11, 2005

NOAA PRIDE Meeting, August 9-10, 2005, Honolulu, Hi

WACCM Meeting 2 June 2005

Purchasing Directors’ Meeting June 9, 2005

Meeting Alhambra, Granada 2003

HIWIRE MEETING Athens, November 3-4, 2005

CCI Meeting June 9, 2009

WINLAB IAB Meeting June 10, 2005

8TH ANNUAL AIJA TRIBUNALS CONFERENCE Sydney, 9-10 June 2005

EUKLEMS Consortium Meeting (June 9 – 11, 2005, Helsinki)

NOBEL – Munich meeting, June 2005

HIWIRE PRESENTATION

HIWIRE MEETING Torino, March 9-10, 2006

RPO National Technical Meeting Modeling Group Meeting June 10, 2005 Denver, Colorado

AHRQ Annual Meeting June 9, 2005

Meeting Alhambra, Granada 2003