300 likes | 312 Views
Learn how to improve speech recognition for non-native users in Let’s Go! Spoken Dialogue System, focusing on linguistic mismatch and adaptive lexical entrainment. Discover the challenges, data collection, and performance evaluation insights.
E N D
Non-Native Users in the Let’s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute Carnegie Mellon University
Background • Speech-enabled systems use models of the user’s language • Such models are tailored for native speech • Great loss of performance for non-native users who don’t follow typical native patterns
Previous Work on Non-Native Speech Recognition • Assumes knowledge about/data from a specific non-native population • Often based on read speech • Focuses on acoustic mismatch: • Acoustic adaptation • Multilingual acoustic models
Linguistic Particularities of Non-Native Speakers • Non-native speakers might use different lexical and syntactic constructs • Non-native speakers are in a dynamic process of L2 acquisition
Outline of the Talk • Baseline system and data collection • Study of non-native/native mismatch and effect of additional non-native data • Adaptive lexical entrainment
The CMU Let’s Go!! System:Bus Schedule Information for the Pittsburgh Area ASR Sphinx II Parsing Phoenix HUBGalaxy Dialogue ManagementRavenClaw Speech Synthesis Festival NLG Rosetta
Data Collection • Baseline system accessible since February 2003 • Experiments with scenarios • Publicized the phone number inside CMU in Fall 2003
Data • Directed experiments: 134 calls • 17 non-native speakers (5 from India, 7 from Japan, 5 others) • Spontaneous: 30 calls • Total: 1768 utterances • Evaluation Data: • Non-Native: 449 utterances • Native: 452 utterances
Speech Recognition Baseline • Acoustic Models: • semi-continuous HMMs (codebook size: 256) • 4000 tied states • trained on CMU Communicator data • Language Model: • class-based backoff 3-gram • trained on 3074 utterances from native calls
Speech Recognition Results Word Error Rate: • Causes of discrepancy: • Acoustic mismatch (accent) • Linguistic mismatch (word choice, syntax)
Language Model Performance Evaluation on transcripts. Initial model: 3074 native utterances
Language Model Performance Adding non-native data: 3074 native+1308 non-native utterances Initial (native) model Mixed model
Natural Language Understanding • Grammar manually written incrementally, as the system was being developed • Initially built with native speakers in mind • Phoenix: robust parser (less sensitive to non-standard expressions)
Initial grammar: Manually written for native utterances Grammar Coverage
Grammar Coverage • Grammar designed to accept some non-native patterns: • “reach” = “arrive” • “What is the next bus?” = “When is the next bus?”
Adaptive Lexical Entrainment • “If you can’t adapt the system, adapt the user” • System should use the same expressions it expects from the user • But non-native speakers might not master all target expressions • Use expressions that are close to the non-native speaker’s language • Use prosody to stress incorrect words
I want to go the airport Did you mean: I want to go TO the airport? Adaptive Lexical Entrainment:Example
Adaptive Lexical Entrainment:Algorithm I want to go the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Algorithm I’d like to go to the airport I want to go the airport Did you mean: I want to go to the airport ? ASR Hypothesis ConfirmationPrompt DP-basedAlignment PromptSelection Emphasis TargetPrompts
Adaptive Lexical Entrainment:Study • 24 dialogues, 12.4 turns/dialogue, 2.5 entrainment prompts/dialogue • Trigger entrainment whenever best target does not perfectly match ASR • Too simple? 40.7% false positive • 64.4% prompts are triggered by misrecognitions
Conclusion • LM and Grammar perform significantly worse on non-native speech • Including non-native data reduces the gap in linguistic metrics but not WER • Future work: • Improve Speech Recognition • More data for LM • Acoustic/Lexical adaptation • Integrate adaptive lexical entrainment in the dialog management framework
Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: Excuse me. I’m in the place which uh where the Forbes Avenue and Murray Avenue are crossing and I want to go to the place mm Forbes where the Forbes Avenue and Craig Street are crossing. S: I heard “SIX”. Is that correct? U: Which bus uh should I take? Uh sorry uh maybe 61C bus I have… What time the next 61C bus will come? S: The time is currently 5:37 PM. Where are you leaving from? …
Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: I want to go to downtown now I’m at Fifth and Bigelow. S: Sorry, did you say “I want to go to downtown”? U: Which number bus can I take and when will the next bus come here? S: ... What can I do for you? …