LORIA

LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007

Missing Data : previous approach • Hypothesis: some coefficients of feature vector are masked by noise • marginalization : to replace p(Y|M) by integration • Approach presented before: y = x + n (additive case because we are in the spectral domain) • Two cases: • If SNR > 0 If SNR < 0 • x>n then y/2<x<y x<n then 0<x<y/2 y y n x n n ….. ….. y/2 y/2 x x 0 0

WP1 : Missing Data • modified approach presented before • Better approximation of the interval of marginalization

Missing Data : new approach • To chose the integral limits in function of the mask estimation Interval of marginalization will be smaller

Proposed masks Noisy speech spectrum Y Clean speech spectrum X • Each Time-Frequency unit is a scalar ( in [0;1] ) which is the relative contribution of speech energy in the observed signal. • Different from mask based on SNR where each unit give the probability that the corresponding pixel is missing.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Proposed masks • Each cluster k is represented by: • a mean vector: μk = (μ1 , … , μN ) • a diagonal covariance matrix: Σk = diag(σ1 , … , σN) • Clusters can be seen as pdfs of the contribution of speech energy in the noisy observed signal. • We propose to consider these clusters as potential missing data masks for any noisy input frame

Missing data :training • For each mask k a GMM model is trained with observation on the noisy frames Y aligned with Mk • Construction of ergodic HMM with previous GMMs

Missing data : recognition • Use ergodic HMM to find the mask k for each frame • Each frame y(t) -> one state -> mask • Use mik and sik of Mk to define the marginalization interval: • [mik - 2 sik , mik +2 sik] • Marginalization:

Missing Data: Experiments • Parameterization • Spectral domain 12 Mel bands + + D • training • HMM models on clean Aurora4 + adaptation with 50 first sentences HIWIRE clean • Mk : trained on noisy HIWIRE (50 first sentences) LN+MN+HM+clean • Test • Noisy HIWIRE (50 last sentences)

Visualisation of the marginalisation intervals on an example One spectral coefficient for word « standby » New method Previous method Clean LN

Visualisation of the marginalisation intervals on an example MN new method previous method HN

WER evaluation new previous

WER based evaluation • Comparison with ETSI AFE: New

Results WER % previous new Oracle : X/Y -> Mk -> marginalisation

New method : High Noise problem True value is outside of the marginalization interval

Conclusion • Better approximation of the interval of marginalization gives better recognition results especially for LN and MN conditions • But mask estimation must be improved in MN and HN conditions

WP2: Non-native speech recognition • Previous work • 2 sets of models: • TIMIT HMM models • Native (Fr, It, Gr, Sp) HMM models • Confusion rules • Integration of the rules in HMM • New study: • Different sets of models

Different sets of models • TIMIT models (canonical English models) • Native models L={Fr, It, Sp, Gr} • MLLR adapted models • TIMIT HMM adapted on HIWIREL • MAP adapted models • TIMIT HMM adapted on HIWIREL • Re-estimated models • TIMIT HMM + Baum-Welch iterations using HIWIREL

Experimental conditions • Adaptation and re-estimation: • Cross-validation system (leave one out): • All speakers exept one for adaptation or re-estimation • The remaining speaker for testing

Results HMM TIMIT MLLR adaptation with HIWIRE HIWIRE grammar TIMIT+ native Retraining on HIWIRE MAP adaptation with HIWIRE Word loop grammar

Results with confusion rules integrated in HMM (HIWIRE grammar) WER SER Baseline 7.2 14.6 5.3 10.2 5.8 11.8 4.8 10.9 3.5 8.1 2.8 6.4 2.8 6.5 2.1 5.0 Best result with TIMIT HMM models (canonical English) + retrained models

Results with speaker adaptation • Using the best system of the previous slide (confusion rules integrated in TIMIT HMM + re-estimation) we add a speaker adaptation step: • 50 first sentences per speaker for adaptation • MAP adaptation • Hiwire grammar • WER : 1.4% • SER : 3.2%

Conclusion • Different sets of models have been tested • Baseline results : • WER : 7.2% SER : 14.6% • Best result is obtained with Confusion with TIMIT HMM + re-estimation+MAP speaker adaptation : • WER : 1.4% SER : 3.2%

Extracted rules /t/ //  /t/ /t/  /k/ //  /t/ // Modifed structure of HMM for model /t/ Example of acoustic model modification for english phone /t/ English phones French phones English model French models

LORIA

LORIA

Presentation Transcript

Caroline Lavecchia , Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France

G loria E stefan

Common Bladder Disorders Kristin Loria

La Loria Konata Learning Commons Coordinator Georgia State University Library

Loria Guitars

Christelle Scharff CIES de Lorraine, LORIA, Université Henri Poincaré Nancy I

LORIA – Laboratoire Lorrain de Recherche en Informatique et ses Applications

Iobidze M, Chikhladze N , Loria L

Best Cosmetic Surgeon In Florida | Dr Victor Loria MD

LORIA – Laboratoire Lorrain de Recherche en Informatique et ses Applications