Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Non-native Speech • Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language Non-native speakers make pronunciation errors & replace phones by others • Read speech or “inter-language words”: errors made by non-native speakers may depend on the writing of the words Take into account the graphemes (characters)

Pronunciation modeling (1/2) Fully automated process & data-driven Needs HMM models of the SL & NL Needs non-native speech database SL HMMs Phonetic alignment Modify the HMM Models of the SL ASR system Non-native database Confusion Rules Phonetic recognition NL HMMs

Pronunciation modeling (2/2) English diphtong [aI] Confusion rules when NL is italian, spanish and greek [aI] [a] [i] P= 0.6 [aI] [a] [e] P= 0.4

Graphemic Constraints (1/2) • Matching between graphemes and phones • Example 1 : • APPROACH /ah p r ow ch/ • APPROACH (ah, A) (p, PP) (r, R) (ow, OA) (ch, CH) • Example 2 : • POSITION /p ah z ih sh ah n/ • POSITION (p, P) (ah, O) (z, S) (ih, I) (sh, TI) (ah, O) (n, N) • New lexicon generation : link phones to graphemes • Confusion rules extraction • Rules implicitly include the graphemic constraints • (english phone, grapheme) → list of NL phones • ex: (ah, A) → a (ah, O) →o • Recognition

Graphemic Constraints (2/2) • Extract the phone-grapheme associations Phonetic dictionary Trained discrete HMM sys. Training Forced alignment Phone-grapheme associations • Applying the graphemic constraints Phone-grapheme associations Trained discrete HMM sys. Modified Target Lexicon, Includes phone-grapheme associations Target Lexicon Forced alignment

Experiments (1/3) • HIWIRE non-native database • 31 French, 20 Italian, 20 Greek & 10 Spanish • 100 sentences per speaker, THALES grammar • 50 first sent. for develop. / 50 last for testing • 13 MFCC + Δ + ΔΔ, 128 gaussian mixtures • “Pronunciation modeling” for each NL • Tests of the baseline vs. PM, MLLR • THALES grammar & word-loop grammar

4.3 Baseline + MLLR WER baseline 6.0 12.8 8.9 SER 7.3 WER 10.5 13.6 SER 19.6 5.1 WER 7.0 14.9 SER 11.1 3.6 WER 5.8 9.4 13.2 SER WER 5.1 7.3 SER 10.8 15.1 Phonetic confusion + MLLR Phonetic confusion 3.1 4.4 7.2 10.2 6.9 4.9 11.5 14.1 3.4 5.1 11.8 8.0 2.9 2.3 6.5 7.5 4.8 3.4 8.3 10.9 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 3.7 4.9 8.5 11.3 8.2 6.5 15.9 14.1 6.2 4.8 13.6 9.8 4.8 6.0 15.1 12.7 5.0 6.3 11.3 14.0 Experiments (2/3) • using THALES grammar French Italian Spanish Greek Average

28.4 WER 37.7 Baseline + MLLR baseline SER 39.4 47.9 45.5 WER 34.9 SER 46.5 52.0 WER 32.3 39.9 SER 53.5 48.3 28.5 WER 36.7 40.0 31.0 SER 40.0 32.2 WER SER 50.7 42.7 Phonetic confusion + MLLR Phonetic confusion 27.3 23.0 42.1 36.6 25.2 31.3 46.2 40.6 29.5 24.7 44.5 40.1 20.3 18.1 35.1 31.3 27.1 22.8 37.2 42.0 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 23.0 26.2 36.6 41.9 30.5 25.6 41.2 45.5 31.3 25.9 46.5 39.6 21.8 24.3 43.0 38.5 24.1 28.1 39.0 44.2 Experiments (3/3) • using a “word-loop” grammar French Italian Spanish Greek Average

Conclusion • Fully automated method for non-native speech recognition, multilingual • Performs slightly better than MLLR • Phonetic confusion + MLLR yet better results • Graphemic constraints did not lead to enhancements : future investigations • 9 more French speakers recorded • Future : automatic detection of the native language of the speaker

Publications • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration”.In Proc. Eurospeech/Interspeech, Lisboa, September 2005. • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints’’.In Proc. ICASSP, Toulouse, France, May 2006. • “Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques’’. In Proc. JEP06, Saint-Malo, France, June 2006. • “Multilingual Non-Native Speech Recognition using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints”. In Proc. ICSLP, Pittsbergs, USA, September 2006. • Writing of journal article for SpeechCom.

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Presentation Transcript

Speech Recognition

Landmark-Based Speech Recognition

Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features

Using Speech Recognition for Speech Therapy

Using Speech Recognition

Speech recognition using HMM

A Recognition Model for Speech Coding

Model-Based Integration Testing

Non p arametric Bayesian Approaches for Acoustic Modeling in Speech Recognition

TAMIL WORDS SPEECH SYNTHESIS IN COCHLEAR IMPLANT USING ACOUSTIC MODEL

Speech recognition

Non p arametric Bayesian Approaches for Acoustic Modeling in Speech Recognition

Speech Recognition

Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition

Speech Recognition

Articulatory Feature-Based Speech Recognition

Articulatory Feature-Based Speech Recognition

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

Articulatory Feature-Based Speech Recognition

Acoustic Modeling for Speech Recognition

Articulatory Feature-Based Speech Recognition

Landmark-Based Speech Recognition