1 / 11

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration. Non-native Speech. Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language

balin
Download Presentation

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

  2. Non-native Speech • Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language Non-native speakers make pronunciation errors & replace phones by others • Read speech or “inter-language words”: errors made by non-native speakers may depend on the writing of the words Take into account the graphemes (characters)

  3. Pronunciation modeling (1/2) Fully automated process & data-driven Needs HMM models of the SL & NL Needs non-native speech database SL HMMs Phonetic alignment Modify the HMM Models of the SL ASR system Non-native database Confusion Rules Phonetic recognition NL HMMs

  4. Pronunciation modeling (2/2) English diphtong [aI] Confusion rules when NL is italian, spanish and greek [aI] [a] [i] P= 0.6 [aI] [a] [e] P= 0.4

  5. Graphemic Constraints (1/2) • Matching between graphemes and phones • Example 1 : • APPROACH /ah p r ow ch/ • APPROACH (ah, A) (p, PP) (r, R) (ow, OA) (ch, CH) • Example 2 : • POSITION /p ah z ih sh ah n/ • POSITION (p, P) (ah, O) (z, S) (ih, I) (sh, TI) (ah, O) (n, N) • New lexicon generation : link phones to graphemes • Confusion rules extraction • Rules implicitly include the graphemic constraints • (english phone, grapheme) → list of NL phones • ex: (ah, A) → a (ah, O) →o • Recognition

  6. Graphemic Constraints (2/2) • Extract the phone-grapheme associations Phonetic dictionary Trained discrete HMM sys. Training Forced alignment Phone-grapheme associations • Applying the graphemic constraints Phone-grapheme associations Trained discrete HMM sys. Modified Target Lexicon, Includes phone-grapheme associations Target Lexicon Forced alignment

  7. Experiments (1/3) • HIWIRE non-native database • 31 French, 20 Italian, 20 Greek & 10 Spanish • 100 sentences per speaker, THALES grammar • 50 first sent. for develop. / 50 last for testing • 13 MFCC + Δ + ΔΔ, 128 gaussian mixtures • “Pronunciation modeling” for each NL • Tests of the baseline vs. PM, MLLR • THALES grammar & word-loop grammar

  8. 4.3 Baseline + MLLR WER baseline 6.0 12.8 8.9 SER 7.3 WER 10.5 13.6 SER 19.6 5.1 WER 7.0 14.9 SER 11.1 3.6 WER 5.8 9.4 13.2 SER WER 5.1 7.3 SER 10.8 15.1 Phonetic confusion + MLLR Phonetic confusion 3.1 4.4 7.2 10.2 6.9 4.9 11.5 14.1 3.4 5.1 11.8 8.0 2.9 2.3 6.5 7.5 4.8 3.4 8.3 10.9 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 3.7 4.9 8.5 11.3 8.2 6.5 15.9 14.1 6.2 4.8 13.6 9.8 4.8 6.0 15.1 12.7 5.0 6.3 11.3 14.0 Experiments (2/3) • using THALES grammar French Italian Spanish Greek Average

  9. 28.4 WER 37.7 Baseline + MLLR baseline SER 39.4 47.9 45.5 WER 34.9 SER 46.5 52.0 WER 32.3 39.9 SER 53.5 48.3 28.5 WER 36.7 40.0 31.0 SER 40.0 32.2 WER SER 50.7 42.7 Phonetic confusion + MLLR Phonetic confusion 27.3 23.0 42.1 36.6 25.2 31.3 46.2 40.6 29.5 24.7 44.5 40.1 20.3 18.1 35.1 31.3 27.1 22.8 37.2 42.0 Phonetic confusion + graphemic constarints Phonetic conf. + graph. const. + MLLR 23.0 26.2 36.6 41.9 30.5 25.6 41.2 45.5 31.3 25.9 46.5 39.6 21.8 24.3 43.0 38.5 24.1 28.1 39.0 44.2 Experiments (3/3) • using a “word-loop” grammar French Italian Spanish Greek Average

  10. Conclusion • Fully automated method for non-native speech recognition, multilingual • Performs slightly better than MLLR • Phonetic confusion + MLLR yet better results • Graphemic constraints did not lead to enhancements : future investigations • 9 more French speakers recorded • Future : automatic detection of the native language of the speaker

  11. Publications • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration”.In Proc. Eurospeech/Interspeech, Lisboa, September 2005. • “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints’’.In Proc. ICASSP, Toulouse, France, May 2006. • “Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques’’. In Proc. JEP06, Saint-Malo, France, June 2006. • “Multilingual Non-Native Speech Recognition using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints”. In Proc. ICSLP, Pittsbergs, USA, September 2006. • Writing of journal article for SpeechCom.

More Related