Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France Télécom - R&D

Overview • Multiple foreign accent speech corpus • Baseline native speech modeling and results • Modeling non-native speech variants • Phonological rules • Units trained on foreign data • Selection of variants • Adaptation on non-native speech • On all types of foreign accents • Only on subsets of foreign accents • Conclusion

Multiple Foreign Accent Speech Corpus • 83 French words and expressions collected over telephone

Baseline Modeling and ResultsUsing Native Speech Models • Modeling : MFCC, HMM, Gaussian mixtures, Context-dependent models • Baseline M1.A1: native French acoustic units only (model M1)trained on large French data speech corpus (acoustic parameters A1) • Large dispersion of recognition performances across speaker language groups(error rates: 6% for German speakers … 12% for English & Spanish speakers)

Modeling Non-Native Speech VariantsVariants Derived through Phonological Rules • Vowels apertures  open / close allowed: e ⇨ (e + ɛ) • Possible denasalization of nasal sounds: ɛ̃ ⇨ (ɛ̃ + ɛN), where N = n, m or ŋ • Difficulty to pronounce front rounded vowel /y/ (⇨ /u/) & semi-vowel /Y/ (⇨/w/) • Application of rules  Model M2 • Significant improvement for many language groups (not all), but overall better

Foreign standard units Standard training e.g. German units trained from German words uttered by German speakers: φ_de_DE For each French units, corresponding foreign units are added for recognition French units adapted on foreign data Mapping between French and foreign units for training, for exampleParis_ukp_uk . a_uk . r_uk . i_uk . s_uk p_fr . a_fr . r_fr . i_fr . s_fr Hence, here, French units adapted on English speech material: φ_fr_UK e_fr_FR e_fr_FR e_sp_SP e_fr_SP e_fr_UK e_uk_UK e_fr_DE e_de_DE Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data  Model M3  Model M4

Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data • Adding "standard foreign units" vs "French units adapted on foreign data" • Better results are obtained when adding French units adapted on foreign data • Improvement on non-native speech • Even for languages that do not correspond to added units

Modeling Non-Native Speech VariantsAdding a Selection of Foreign Adapted Units • Instead of keeping all variants (units) added for each phoneme, only the most frequently ones are kept (model M5)(statistics using force alignments on adaptation set) • Degradation performances (due to added units) on French speakers smaller • Improvement on language groups associated to added units smaller • Better results on other language groups

Adaptation on Non-native Speech • Adaptation set: about same size as test setExhibits similar non-native accents (same countries) Generic model M3.A1French native units&standard foreign units Generic models M1.A1 & M2.A1French native unitswithout / withphonological rules Generic models M4.A1 & M5.A1French native units&French units adapted on foreign data Non-native speech adaptation corpusFrench words pronunced by foreign speakers, … Accent adapted model M3.A5 Accent adapted models M1.A5 & M2.A5 Accent adapted models M4.A5 & M5.A5

Adaptation on Non-native Speech Adaptation using all Types of Accents • Behavior of various modeling variants after all accents adaptation is similar to the behavior obtained with generic models

Adaptation on Non-native SpeechImpact of Types of Accents (1) • Experiments using the best model (model M5) • Reference results with generic parameters (model M5.A1) • Adaptation using data from French speakers only (model M5.A2)corresponds task and context adaptation • Adaptation using data from limited set of accents: Spanish, English and German speakers only (model M5.A3) • Adaptation using data from other types of accents: Italian, Portuguese, … and Asian speakers only (model M5.A4) • And results after adaptation using all types of accents (model M5.A5)

Adaptation on Non-native SpeechImpact of Types of Accents (2) • Adaptation on French speakers only (M5.A2) improves on almost all accented data • Best results obtained with adaptation on all types of accents (M5.A5)

Adaptation on Non-native SpeechImpact of Types of Accents (3) • After adaptation on only a few types of accents: Es, En, De (i.e. model M5.A3) • Large improvement achieved on all accented data including on accents that are not present in adaptation set

Conclusion • Non-native speech recognition takes benefit of variants • Application of phonological rules and introduction of units trained on foreign data • Selection of variants is beneficial • Adaptation on non-native speech provides important improvement for each type of modeling, and variants are still useful • Adaptation on speech data representing a limited set of foreign accents is also beneficial for other types of accents

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

Presentation Transcript

Speech Recognition

Speech Recognition

Adaptation Techniques in Automatic Speech Recognition

Using Speech Recognition for Speech Therapy

A Recognition Model for Speech Coding

Speech Recognition

Speech recognition

Combining Speech Attributes for Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition