1 / 32

Sorry, I didn’t catch that …

Sorry, I didn’t catch that …. Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various recovery strategies Dan Bohus Sphinx Lunch Talk Carnegie Mellon University, March 2005. S: What city are you leaving from?

gordy
Download Presentation

Sorry, I didn’t catch that …

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various recovery strategies Dan Bohus Sphinx Lunch Talk Carnegie Mellon University, March 2005

  2. S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON-understanding • System cannot extract any meaningful information from the user’s turn Non-understandings • How can we prevent non-understandings? • How can we recover from them? • Detection • Set of recovery strategies • Policy for choosing between them review: sources : impact : strategy performance

  3. Issues under investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  4. Data Collection: Experimental Design • Subjects interact over the telephone with RoomLine • Performed 10 of scenario-based tasks • Between-subjects experiment, 2 groups: • Control: system uses a random (uniform) policy for engaging the non-understanding recovery strategies • Wizard: policy is determined at runtime by a human (wizard) • 46 subjects, balanced gender x native • 449 sessions; 8278 user turns • Sessions transcribed & annotated review: sources : impact : strategy performance

  5. Non-understanding Strategies S: For when do you need the room? U: [non-understanding] MOVE-ON 1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room? 2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … 5. ASK REPEAT (AREP) Could you please repeat that? 6. ASK REPHRASE (ARPH) Could you please try to rephrase that? 7. NOTIFY (NTFY) Sorry, I didn’t catch that ... 8. YIELD TURN (YLD) … 9. REPROMPT (RP) For when do you need the conference room? 10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … HELP REPEAT NOTIFY REPROMPT review: sources : impact : strategy performance

  6. Issues under Investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  7. Goal Interpretation Semantics Parsing Text Recognition Audio Channel End-pointing Communication [Clark, Horvitz, Paek] System User ConversationLevel IntentionLevel SignalLevel ChannelLevel review: sources : impact : strategy performance

  8. Goal Interpretation Semantics Parsing Text Recognition Audio Channel End-pointing Modeling and Breakdowns System User ConversationLevel IntentionLevel SignalLevel ChannelLevel review: sources : impact : strategy performance

  9. Goal Interpretation Semantics Parsing Text Recognition Audio Channel End-pointing “Location” & “types” of errors System User Out-of-domainOut-of-application False Rejections Out-of-grammarOut-of-relevance ASR errorsaccents noises End-pointer errors review: sources : impact : strategy performance

  10. % of non-understandings Out-of-domainOut-of-application False Rejections 0.14% 12.89% 18.59% Out-of-grammarOut-of-relevance 8.02% 3.21% ASR errorsaccents noises 56.05% End-pointer errors 3.91% review: sources : impact : strategy performance

  11. Out-of-application (13% of Nonu) • 2 main classes, about equally split • Request for inexistent task functionality • “A room Monday or Tuesday” • “do you have anything anytime Thursday afternoon?” • Request for inexistent “meta” functionality • Corrections: • “Can I change the date” • “You got the time wrong” • “Wrong day” • Q: How to better convey system boundaries? • Q: Extend system language for corrections? review: sources : impact : strategy performance

  12. Out-of-grammar (8% of Nonu) • Imperfect grammar coverage • “Doesn’t matter”  “It doesn’t matter” • “Internet connection”  “Network connection” • “Vaguely”  “So so” / “Generally” / etc • Q: Bring users in grammar? • Carefully craft & use the “You Can Say” prompts • Q: Extend the grammar? • Online & in an unsupervised fashion? review: sources : impact : strategy performance

  13. Grammaticality - Summary • It’s important: 25% of non-understandings • Stems (about equally) from: • Requests for inexistent task functionality • Requests for inexistent meta/corrections functionality • Lack of grammar coverage • Solutions • Offline: enlarge grammar, include correction language • Online • Carefully design “You Can Say” • All You Can Say [Collagen / USI] • Unsupervised learning of new grammar expressions review: sources : impact : strategy performance

  14. All You Can Say • How much of the system functionality is actually used? [under work] • Certain “task” and “meta” aspects of functionality are very rarely or never used User System

  15. % of non-understandings Out-of-domainOut-of-application False Rejections 0.14% 12.89% 18.59% Out-of-grammarOut-of-relevance 8.02% 3.21% ASR errorsaccents noises 56.05% End-pointer errors 3.91% review: sources : impact : strategy performance

  16. Issues under Investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  17. Impact on system performance • Logistic regression model • Task Success  % Non-understandings per session • Natives are more likely to succeed at the same non-understandings rate • (Participants in the wizard condition also) • 2nd model (also use Misunderstandings) • Task success  % Non + % Mis • Better fit • Adding native information does not improve model • Non-u on average half as costly review: sources : impact : strategy performance

  18. Issues under Investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones? • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  19. Issues under Investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones? • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  20. Non-understanding Strategies S: For when do you need the room? U: [non-understanding] MOVE-ON 1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room? 2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … 5. ASK REPEAT (AREP) Could you please repeat that? 6. ASK REPHRASE (ARPH) Could you please try to rephrase that? 7. NOTIFY (NTFY) Sorry, I didn’t catch that ... 8. YIELD TURN (YLD) … 9. REPROMPT (RP) For when do you need the conference room? 10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … HELP REPEAT NOTIFY REPROMPT review: sources : impact : strategy performance

  21. How to evaluate performance? • Recovery • Next turn is okay (not a non-understanding, not a misunderstanding) • Finer-grained recovery • Next turn CER • Next turn concept transfer (dialog cost) • Time (+recovery) ?? • Time lost: 0 if next turn okay, time lost otherwise • Time to recovery (has some problems) • [More stuff under construction] review: sources : impact : strategy performance

  22. Which strategies are better? review: sources : impact : strategy performance

  23. Which strategies are better? • Recovery performance ranked list, based on pair-wise t-tests: • CER evaluation shows similar results review: sources : impact : strategy performance

  24. Which strategies are better? MoveOn ≥ Help > Signal * p = 0.1089 review: sources : impact : strategy performance

  25. What is the Impact on User Response? • Labeled user responses in 5 classes:[same tagging scheme as Shin, Choularton] • Answer (1st) • Repeat • Rephrase • Change • Contradict • Other • Hang-up review: sources : impact : strategy performance

  26. What is the Impact on User Response? • Labeled user responses in 5 classes:[same tagging scheme as Shin, Choularton] • Answer (1st) • Repeat • Rephrase • Change • Contradict • Other • Hang-up 17.95% 44.30% 30.70% 3.63% 3.13% review: sources : impact : strategy performance

  27. Comparing with other systems review: sources : impact : strategy performance

  28. What responses are the best? • Recovery as a function of response type • Answer (1st) • Repeat • Rephrase • Change • Contradict • Other • Hang-up 45.45% 39.33% 63.29% 19.05% review: sources : impact : strategy performance

  29. More to come … • Per-strategy analysis • Barge-in & impact on recovery review: sources : impact : strategy performance

  30. Issues under Investigation • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones? • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

  31. Refining the current set of strategies • Introduce more alternative dialog plans • opportunities for Move-On • “You Can Say” • Carefully tune the prompts • Smarter barge-in control • “All You Can Say” • “Speak shorter” • Anecdotal evidence  to be corroborated by analysis • “Speak louder / go to a quieter place” • Not so much in these experiments, but evidence from Let’s go! • More prevention measures • If someone has troubles, you can give the YCS prompts without waiting for a non-understanding to happen review: sources : impact : strategy performance

  32. Thank You!! • Data Collection • Detection / Diagnosis • What are the main causes (sources) of non-understandings? • What is their impact on global performance? • Can we diagnose non-understandings at run-time? • Can we optimize the rejection process in a more principled way? • Set of recovery strategies • What is the relative performance of different recovery strategies? • Can we refine current strategies and find new ones? • Policy for choosing between them • Can we improve performance by making smarter choices? • If so, can we learn how to make these smarter choices? review: sources : impact : strategy performance

More Related