1 / 30

Enhancing Speech Recognition for Non-Native Users in Let’s Go! Dialogue System

Learn how to improve speech recognition for non-native users in Let’s Go! Spoken Dialogue System, focusing on linguistic mismatch and adaptive lexical entrainment. Discover the challenges, data collection, and performance evaluation insights.

mcesar
Download Presentation

Enhancing Speech Recognition for Non-Native Users in Let’s Go! Dialogue System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Native Users in the Let’s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute Carnegie Mellon University

  2. Background • Speech-enabled systems use models of the user’s language • Such models are tailored for native speech • Great loss of performance for non-native users who don’t follow typical native patterns

  3. Previous Work on Non-Native Speech Recognition • Assumes knowledge about/data from a specific non-native population • Often based on read speech • Focuses on acoustic mismatch: • Acoustic adaptation • Multilingual acoustic models

  4. Linguistic Particularities of Non-Native Speakers • Non-native speakers might use different lexical and syntactic constructs • Non-native speakers are in a dynamic process of L2 acquisition

  5. Outline of the Talk • Baseline system and data collection • Study of non-native/native mismatch and effect of additional non-native data • Adaptive lexical entrainment

  6. The CMU Let’s Go!! System:Bus Schedule Information for the Pittsburgh Area ASR Sphinx II Parsing Phoenix HUBGalaxy Dialogue ManagementRavenClaw Speech Synthesis Festival NLG Rosetta

  7. Data Collection • Baseline system accessible since February 2003 • Experiments with scenarios • Publicized the phone number inside CMU in Fall 2003

  8. Data Collection Web Page

  9. Data • Directed experiments: 134 calls • 17 non-native speakers (5 from India, 7 from Japan, 5 others) • Spontaneous: 30 calls • Total: 1768 utterances • Evaluation Data: • Non-Native: 449 utterances • Native: 452 utterances

  10. Speech Recognition Baseline • Acoustic Models: • semi-continuous HMMs (codebook size: 256) • 4000 tied states • trained on CMU Communicator data • Language Model: • class-based backoff 3-gram • trained on 3074 utterances from native calls

  11. Speech Recognition Results Word Error Rate: • Causes of discrepancy: • Acoustic mismatch (accent) • Linguistic mismatch (word choice, syntax)

  12. Language Model Performance Evaluation on transcripts. Initial model: 3074 native utterances

  13. Language Model Performance Adding non-native data: 3074 native+1308 non-native utterances Initial (native) model Mixed model

  14. Natural Language Understanding • Grammar manually written incrementally, as the system was being developed • Initially built with native speakers in mind • Phoenix: robust parser (less sensitive to non-standard expressions)

  15. Initial grammar: Manually written for native utterances Grammar Coverage

  16. Grammar Coverage • Grammar designed to accept some non-native patterns: • “reach” = “arrive” • “What is the next bus?” = “When is the next bus?”

  17. Relative Improvement due to Additional Data

  18. Effect of Additional Data on Speech Recognition

  19. Adaptive Lexical Entrainment • “If you can’t adapt the system, adapt the user” • System should use the same expressions it expects from the user • But non-native speakers might not master all target expressions • Use expressions that are close to the non-native speaker’s language • Use prosody to stress incorrect words

  20. I want to go the airport Did you mean: I want to go TO the airport? Adaptive Lexical Entrainment:Example

  21. Adaptive Lexical Entrainment:Algorithm I want to go the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  22. Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  23. Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  24. Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  25. Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  26. Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport Did you mean: I want to go to the airport ? ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts

  27. Adaptive Lexical Entrainment:Study • 24 dialogues, 12.4 turns/dialogue, 2.5 entrainment prompts/dialogue • Trigger entrainment whenever best target does not perfectly match ASR • Too simple? 40.7% false positive • 64.4% prompts are triggered by misrecognitions

  28. Conclusion • LM and Grammar perform significantly worse on non-native speech • Including non-native data reduces the gap in linguistic metrics but not WER • Future work: • Improve Speech Recognition • More data for LM • Acoustic/Lexical adaptation • Integrate adaptive lexical entrainment in the dialog management framework

  29. Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: Excuse me. I’m in the place which uh where the Forbes Avenue and Murray Avenue are crossing and I want to go to the place mm Forbes where the Forbes Avenue and Craig Street are crossing. S: I heard “SIX”. Is that correct? U: Which bus uh should I take? Uh sorry uh maybe 61C bus I have… What time the next 61C bus will come? S: The time is currently 5:37 PM. Where are you leaving from? …

  30. Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: I want to go to downtown now I’m at Fifth and Bigelow. S: Sorry, did you say “I want to go to downtown”? U: Which number bus can I take and when will the next bus come here? S: ... What can I do for you? …

More Related