270 likes | 368 Views
sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies. Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213.
E N D
sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213
System extracts incorrect information from the user’s turn MIS-understanding S: What city are you leaving from? U: Birmingham [BERLIN PM] • System cannot extract any meaningful information from the user’s turn NON-understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] systems often do not understand correctly • non-understandings and misunderstandings
System cannot extract any meaningful information from the user’s turn NON-understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] systems often do not understand correctly • detection • strategies • policy (knowing how to engage the strategies) • typically trivial; although diagnosis is not • large space of strategies • tradeoffs between them not well understood • simple heuristics: “incremental prompting”
questions under investigation • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? • data • can we improve global dialog performance by using a smarter policy? • if yes, can we learn a better policy from data?
data collection • Roomline • phone-based, mixed-initiative system • conference room reservations • experimental design • control group: uninformed recovery policy • wizard group: recovery policy implemented by wizard • 46 participants, first-time users • tasks & experimental procedure • up to 10 scenario-driven interactions
non-understanding recovery strategies S: For when do you need the conference room? 1. ASK REPEAT Could you please repeat that? 2. ASK REPHRASE Could you please try to rephrase that? 3. NOTIFY (NTFY) Sorry, I didn’t catch that ... 4. YIELD TURN (YLD) … 5. REPROMPT (RP) For when do you need the conference room? 6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room? 8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …
corpus statistics • 449 sessions • 8278 user turns • utterances transcribed and checked • manual annotations • misunderstandings • correct concept values at each turn • sources of understanding errors • user response-types to recovery strategies
questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors?
Goal Interpretation Semantics Parsing Text Recognition Audio channel End-pointing causes of non-understandings system user conversationlevel intentionlevel signallevel channellevel
causes of non-understandings out-of-application conversationlevel 16% out-of-grammar intentionlevel 16% ASR error signallevel 62% endpointer error channellevel
questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors
modeling impact on performance • logistic regression • P(Task Success) = 1 1 + e-(α + β·FNON)
questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors
strategy performance – recovery rate • overall logistic ANOVA • significant differences in mean recovery rates recovery rate Help Yield Notify MoveOn RePrompt AskRepeat YouCanSay AskRephrase TerseYouCanSay DetailedReprompt • all pairs comparison (corrected using FDR)
questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors
user response types • tagging scheme by Shin • also used by Choularton, Raux • 5 categories • repeat • rephrase • contradict • change • other
response types after non-understaning 50% Communicator (Shin et al.) 40% Pizza (choularton & dale) Roomline (this study) 30% 20% 10% 0% contradict change other rephrase repeat
user response types by strategy 100% Other 80% Change Rephrase 60% Repeat 40% 20% 0% Help Yield Notify MoveOn RePrompt AskRepeat YouCanSay AskRephrase TerseYouCanSay DetailedReprompt
summary • sources of non-understandings • impact on performance • strategy comparison • user responses • asr, but also “language” errors → more shaping strategies … • regression model allows better quantitative assessment • help, “move-on” → further investigate “move-on” • margin for improving control over user responses • can we improve global dialog performance by using a smarter policy? • can we learn a better policy from data? • yes • preliminary results promising …
Before rejectionmechanism After rejectionmechanism False rejections Correct rejections Figure 3. Misunderstandings and non-understandings before and after rejections rejections
strategy performance assessment • recovery rate • recovery utility • weighted sum of correctly and incorrectly acquired concepts • weights are determined in a data-driven fashion • recovery efficiency • also takes time to recovery into account
experimental design: scenarios • 10 scenarios, fixed order • presented graphically (explained during briefing)
strategy pair-wise comparison • recovery performance ranked list, based on pair-wise t-tests: • CER evaluation shows similar results
impact of recovery rate on performance • recovery = next turn is correctly understood • P(Task Success) = 1 1 + e-(α + β·RecoveryRate)