110 likes | 220 Views
Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration. Non-native Speech. Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language
E N D
Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration
Non-native Speech • Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language Non-native speakers make pronunciation errors & replace phones by others • Read speech or “inter-language words”: errors made by non-native speakers may depend on the writing of the words Take into account the graphemes (characters)
Pronunciation modeling (1/2) Fully automated process & data-driven Needs HMM models of the SL & NL Needs non-native speech database SL HMMs Phonetic alignment Modify the HMM Models of the SL ASR system Non-native database Confusion Rules Phonetic recognition NL HMMs
Pronunciation modeling (2/2) English diphtong [aI] Confusion rules when NL is italian, spanish and greek [aI] [a] [i] P= 0.6 [aI] [a] [e] P= 0.4
Graphemic Constraints (1/2) • Matching between graphemes and phones • Example 1 : • APPROACH /ah p r ow ch/ • APPROACH (ah, A) (p, PP) (r, R) (ow, OA) (ch, CH) • Example 2 : • POSITION /p ah z ih sh ah n/ • POSITION (p, P) (ah, O) (z, S) (ih, I) (sh, TI) (ah, O) (n, N) • New lexicon generation : link phones to graphemes • Confusion rules extraction • Rules implicitly include the graphemic constraints • (english phone, grapheme) → list of NL phones • ex: (ah, A) → a (ah, O) →o • Recognition
Graphemic Constraints (2/2) • Extract the phone-grapheme associations Phonetic dictionary Trained discrete HMM sys. Training Forced alignment Phone-grapheme associations • Applying the graphemic constraints Phone-grapheme associations Trained discrete HMM sys. Modified Target Lexicon, Includes phone-grapheme associations Target Lexicon Forced alignment
Experiments (1/3) • HIWIRE non-native database • 31 French, 20 Italian, 20 Greek & 10 Spanish • 100 sentences per speaker, THALES grammar • 50 first sent. for develop. / 50 last for testing • 13 MFCC + Δ + ΔΔ, 128 gaussian mixtures • “Pronunciation modeling” for each NL • Tests of the baseline vs. PM, MLLR • THALES grammar & word-loop grammar
4.3 Baseline + MLLR WER baseline 6.0 12.8 8.9 SER 7.3 WER 10.5 13.6 SER 19.6 5.1 WER 7.0 14.9 SER 11.1 3.6 WER 5.8 9.4 13.2 SER WER 5.1 7.3 SER 10.8 15.1 Phonetic confusion + MLLR Phonetic confusion 3.1 4.4 7.2 10.2 6.9 4.9 11.5 14.1 3.4 5.1 11.8 8.0 2.9 2.3 6.5 7.5 4.8 3.4 8.3 10.9 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 3.7 4.9 8.5 11.3 8.2 6.5 15.9 14.1 6.2 4.8 13.6 9.8 4.8 6.0 15.1 12.7 5.0 6.3 11.3 14.0 Experiments (2/3) • using THALES grammar French Italian Spanish Greek Average
28.4 WER 37.7 Baseline + MLLR baseline SER 39.4 47.9 45.5 WER 34.9 SER 46.5 52.0 WER 32.3 39.9 SER 53.5 48.3 28.5 WER 36.7 40.0 31.0 SER 40.0 32.2 WER SER 50.7 42.7 Phonetic confusion + MLLR Phonetic confusion 27.3 23.0 42.1 36.6 25.2 31.3 46.2 40.6 29.5 24.7 44.5 40.1 20.3 18.1 35.1 31.3 27.1 22.8 37.2 42.0 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 23.0 26.2 36.6 41.9 30.5 25.6 41.2 45.5 31.3 25.9 46.5 39.6 21.8 24.3 43.0 38.5 24.1 28.1 39.0 44.2 Experiments (3/3) • using a “word-loop” grammar French Italian Spanish Greek Average
Conclusion • Fully automated method for non-native speech recognition, multilingual • Performs slightly better than MLLR • Phonetic confusion + MLLR yet better results • Graphemic constraints did not lead to enhancements : future investigations • 9 more French speakers recorded • Future : automatic detection of the native language of the speaker
Publications • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration”.In Proc. Eurospeech/Interspeech, Lisboa, September 2005. • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints’’.In Proc. ICASSP, Toulouse, France, May 2006. • “Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques’’. In Proc. JEP06, Saint-Malo, France, June 2006. • “Multilingual Non-Native Speech Recognition using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints”. In Proc. ICSLP, Pittsbergs, USA, September 2006. • Writing of journal article for SpeechCom.