1 / 14

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

This research paper examines the impact of model adaptation on non-native speech recognition for multiple accents. It explores baseline native speech modeling, phonological rules, adaptation on non-native speech, and the selection of variants. The study concludes that adaptation on non-native speech provides significant improvement for each type of modeling.

balk
Download Presentation

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France Télécom - R&D

  2. Overview • Multiple foreign accent speech corpus • Baseline native speech modeling and results • Modeling non-native speech variants • Phonological rules • Units trained on foreign data • Selection of variants • Adaptation on non-native speech • On all types of foreign accents • Only on subsets of foreign accents • Conclusion

  3. Multiple Foreign Accent Speech Corpus • 83 French words and expressions collected over telephone

  4. Baseline Modeling and ResultsUsing Native Speech Models • Modeling : MFCC, HMM, Gaussian mixtures, Context-dependent models • Baseline M1.A1: native French acoustic units only (model M1)trained on large French data speech corpus (acoustic parameters A1) • Large dispersion of recognition performances across speaker language groups(error rates: 6% for German speakers … 12% for English & Spanish speakers)

  5. Modeling Non-Native Speech VariantsVariants Derived through Phonological Rules • Vowels apertures  open / close allowed: e ⇨ (e + ɛ) • Possible denasalization of nasal sounds: ɛ̃ ⇨ (ɛ̃ + ɛN), where N = n, m or ŋ • Difficulty to pronounce front rounded vowel /y/ (⇨ /u/) & semi-vowel /Y/ (⇨/w/) • Application of rules  Model M2 • Significant improvement for many language groups (not all), but overall better

  6. Foreign standard units Standard training e.g. German units trained from German words uttered by German speakers: φ_de_DE For each French units, corresponding foreign units are added for recognition French units adapted on foreign data Mapping between French and foreign units for training, for exampleParis_ukp_uk . a_uk . r_uk . i_uk . s_uk p_fr . a_fr . r_fr . i_fr . s_fr Hence, here, French units adapted on English speech material: φ_fr_UK e_fr_FR e_fr_FR e_sp_SP e_fr_SP e_fr_UK e_uk_UK e_fr_DE e_de_DE Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data  Model M3  Model M4

  7. Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data • Adding "standard foreign units" vs "French units adapted on foreign data" • Better results are obtained when adding French units adapted on foreign data • Improvement on non-native speech • Even for languages that do not correspond to added units

  8. Modeling Non-Native Speech VariantsAdding a Selection of Foreign Adapted Units • Instead of keeping all variants (units) added for each phoneme, only the most frequently ones are kept (model M5)(statistics using force alignments on adaptation set) • Degradation performances (due to added units) on French speakers smaller • Improvement on language groups associated to added units smaller • Better results on other language groups

  9. Adaptation on Non-native Speech • Adaptation set: about same size as test setExhibits similar non-native accents (same countries) Generic model M3.A1French native units&standard foreign units Generic models M1.A1 & M2.A1French native unitswithout / withphonological rules Generic models M4.A1 & M5.A1French native units&French units adapted on foreign data Non-native speech adaptation corpusFrench words pronunced by foreign speakers, … Accent adapted model M3.A5 Accent adapted models M1.A5 & M2.A5 Accent adapted models M4.A5 & M5.A5

  10. Adaptation on Non-native Speech Adaptation using all Types of Accents • Behavior of various modeling variants after all accents adaptation is similar to the behavior obtained with generic models

  11. Adaptation on Non-native SpeechImpact of Types of Accents (1) • Experiments using the best model (model M5) • Reference results with generic parameters (model M5.A1) • Adaptation using data from French speakers only (model M5.A2)corresponds task and context adaptation • Adaptation using data from limited set of accents: Spanish, English and German speakers only (model M5.A3) • Adaptation using data from other types of accents: Italian, Portuguese, … and Asian speakers only (model M5.A4) • And results after adaptation using all types of accents (model M5.A5)

  12. Adaptation on Non-native SpeechImpact of Types of Accents (2) • Adaptation on French speakers only (M5.A2) improves on almost all accented data • Best results obtained with adaptation on all types of accents (M5.A5)

  13. Adaptation on Non-native SpeechImpact of Types of Accents (3) • After adaptation on only a few types of accents: Es, En, De (i.e. model M5.A3) • Large improvement achieved on all accented data including on accents that are not present in adaptation set

  14. Conclusion • Non-native speech recognition takes benefit of variants • Application of phonological rules and introduction of units trained on foreign data • Selection of variants is beneficial • Adaptation on non-native speech provides important improvement for each type of modeling, and variants are still useful • Adaptation on speech data representing a limited set of foreign accents is also beneficial for other types of accents

More Related