1 / 32

HIWIRE MEETING Granada, June 9-10, 2005

GSTC UGR. HIWIRE MEETING Granada, June 9-10, 2005. JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ. Schedule. Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization Non-linear speaker adaptation/independence

vesna
Download Presentation

HIWIRE MEETING Granada, June 9-10, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GSTC UGR HIWIRE MEETINGGranada, June 9-10, 2005 JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ

  2. Schedule • Non-linear feature normalization • ECDF segmental implementation • Progressive equalization • 2-class normalization • Non-linear speaker adaptation/independence • Non-linear feature normalization • Non-linear model adaptation • VAD and technique combination • MO-LRT • Bi-spectrum based VAD • Combined Front-End

  3. Schedule • Non-linear feature normalization • ECDF segmental implementation • Progressive equalization • 2-class normalization • Non-linear speaker adaptation/independence • Non-linear feature normalization • Non-linear model adaptation • VAD and technique combination • MO-LRT • Bi-spectrum based VAD • Combined Front-End

  4. ECDF-based nonlinear transformation (1) • CDF-matching nonlinear transformation • In previous works we modeled CDF’s by using histograms

  5. ECDF-based nonlinear transformation (2) • An alternative algorithm based on Order Statistics • Is faster, only requires sorting and table indexing • Results are almost equal to those obtained with histograms

  6. ECDF Segmental implementation • Based on a sliding window • José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio, Javier Ramírez, Cepstral domain segmental nonlinear feature transformations for robust speech recognition, IEEE Signal Processing Letters.,Vol.11, pp. 666-669, 2004

  7. Progressive normalization • As not all MFCC offer equal discrimination • And HEQ introduces certain distortion • Normalization up to a certain MFCC gives the best performance

  8. ECDF-based normalization results

  9. Test01 Test02 C0 C1 2-class normalization (1) • A first approach on parametric non-linear equalization • PDF’s are modeled as two-Gaussian class mixtures for each MFCC • Actually we use speech/noise like classes • EM is used on each sentence to obtain the Gaussian classes

  10. Equalization of C1 between Test02(Car) and Test01(Clean) of WSJ0 data 2-class normalization (2) Nonlinear parametric transformation

  11. 2-class normalization results

  12. Schedule • Non-linear feature normalization • ECDF segmental implementation • Progressive equalization • 2-class normalization • Non-linear speaker adaptation/independence • Non-linear feature normalization • Non-linear model adaptation • VAD and technique combination • MO-LRT • Bi-spectrum based VAD • Combined Front-End

  13. ECDF Features Normalization • HEQ as a non-linear speaker normalization technique using ECDF

  14. ECDF Norm. for SA

  15. ECDF Models Adaptation 2 APPROACHES • Pure Equalization: “HEQ MOD” new Gaussian Distributions: - shift on the means: X ->X HEQ - scale factor on the variances • Equalization mixed with linear transformation: “HEQ PLIN” LT: XA = M*X + B M’, B’ such that D(XA, XHEQ) = || M’X+B’ - XHEQ || 2 = minimum Speaker Specific Features Speaker Independent Features

  16. Models Adaptation

  17. SA methods. Comparison

  18. Future Work 1/2 • SA models using MLLR are not robust against noise Feature Normalization + MLLR

  19. Future Work 2/2 • Non linear Feature Normalization and Model Adaptation Development of further experiments with more complex tasks on WSJ1 database (spoke3 and spoke4)

  20. Schedule • Non-linear feature normalization • ECDF segmental implementation • Progressive equalization • 2-class normalization • Non-linear speaker adaptation/independence • Non-linear feature normalization • Non-linear model adaptation • VAD and technique combination • MO-LRT • Bi-spectrum based VAD • Combined Front-End

  21. Previous work on VAD • Voice activity detection: • Kullback-Leibler divergence • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “A New Kullback-Leibler VAD for Robust Speech Recognition”, IEEE Signal Processing Letters, Vol.11, No.2, pp. 666-669, Feb. 2004 • Long-term spectral divergence • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information”, Speech Communication, Vol. 42/3-4, pp. 271-287, 2004 • Subband SNR estimation using OS filters • J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition”, To appear in IEEE Transactions on Speech and Audio Processing, 2005/2006. • Multiple observation likelihood ratio test • J. Ramírez, J. C. Segura, C. Benítez, L. García, A. Rubio, “Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test”, To appear in IEEE Signal Processing Letters

  22. Likelihood ratio test • Generalization of the Sohn’s VAD: • J. Sohn, N. S. Kim, W. Sung, “A statistical model-based voice activity detection”, IEEE Signal Processing Letters, vol. 16 (1), pp. 1-3, 1999. • Two hypothesis are considered: • H0 : y= nAbsence of speech (Silence) • H1 : y= s + nSpeech presence • Optimum decision rule (Bayes classifier): • l-frame observation vector: • LRT evaluation  Adequate signal model LRT: Likelihood ratio test

  23. Multiple observation likelihood ratio test • MO-LRT (multiple observation LRT): • Given a set of N= 2m+1 consecutive observations: • LRT: • Under statistical independence: • Recursive Log-LRT:

  24. m Analysis: Optimum delay Probability distributions Classification errors • Increasing m (number of the observations): • Reduction of the overlap between the distributions • Misclassification errors: Reduced for speech vs Moderate increase for non-speech

  25. MO-LRT Sohn’s VAD Analysis: Optimum delay • ROC analysisAURORA 3 Spanish (High-Ch1, 5dB)

  26. Speech recognition experiments Frame dropping (FD) Wiener Filtering (WF) MFCC HTK Noise estimation VAD AURORA 2: Average Wacc (%) for CT and MCT

  27. Speech recognition experiments AURORA 3: Spanish SpeechDat-Car

  28. Work in progress • Statistical tests in the bispectrum domain: • J. M. Górriz, et al., “Voice Activity Detection Based on HOS”, 8th International Work-Conference on Artificial Neural Networks (IWANN'2005) • J. M. Górriz, et al., “Statistical Tests for Voice Activity Detection”, Non-linear Speech Processing (NOLISP’2005), 2005. • J. M. Górriz, et al., “Bispectra analysis-based VAD for robust speech recognition”, First International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC’2005) • Bispectrum LRT (application of MO-LRT on the bispectra) • J. M. Górriz, et al, “An Improved MO-LRT VAD Based on a Bispectra Gaussian Model”, Submitted to Electronics Letters.

  29. Segmental ECDF (Gaussian ref.) Progressive Noise reduction Frame dropping HTK LTSE VAD GSTC-UGR speech recognition results • LTSE VAD: • J. Ramírez, et al., “Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information”, Speech Communication, Vol. 42/3-4, pp. 271-287, 2004 • Segmental ECDF: 60 frame delay • J. C. Segura, et al., “Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition”, IEEE Signal Processing Letters, Vol.11, No. 5, pp. 517 - 520, 2004 • Progressive: • Log-E + Up to the 4th cepstral coefficient

  30. GSTC-UGR speech recognition results WER Relative Improvements: 12% (MCT) 59% (CT) WER Relative Improvements: 60% (WM) 46% (MM) 73% (HM)

  31. GSTC-UGR speech recognition results AURORA 4 WER (%) (clean training experiments) WER Relative Improvements: 20% (Test sets 1:7) 17% (Test sets 8:14)

  32. GSTC UGR HIWIRE MEETINGGranada, June 9-10, 2005 JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ

More Related