390 likes | 400 Views
This paper explores the problem of belief updating in spoken dialog systems. It discusses approaches to handle understanding errors and presents strategies for detecting and recovering from misunderstandings. The paper also addresses the challenge of constructing accurate beliefs by integrating information over multiple turns.
E N D
Belief Updating in Spoken Dialog Systems Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
problem • stems mostly from speech recognition • spans most domains and interaction types spoken language interfaces lack robustness when faced with understanding errors.
more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
NON understanding MIS understanding non- and misunderstandings S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
approaches for increasing robustness • gracefully handle errors through interaction • fix recognition • detect the problems • develop a set of recovery strategies • know how to choose between them (policy)
misunderstandings non-understandings detection strategies policy six not-so-easy pieces …
belief updating • construct more accurate beliefs by integrating information over multiple turns misunderstandings detection S: Where would you like to go? U: Huntsville [SEOUL / 0.65] destination = {seoul/0.65} S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60] destination = {?}
belief updating: problem statement • given: • an initial belief Pinitial(C) over concept C • a system action SA • a user response R • construct an updated belief: • Pupdated(C) ← f (Pinitial(C), SA, R) destination = {seoul/0.65} S: traveling to Seoul. What day did you need to travel? [THE TRAVELING TO BERLIN P_M / 0.60] destination = {?}
outline • related work • a restricted version • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
confidence annotation + heuristic updates • confidence annotation • traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar] • more recently: semantic confidence annotation [Walker, San-Segundo, Bohus] • machine learning approach • results fairly good, but not perfect • heuristic updates • explicit confirmation: no → don’t trust ; yes → trust • implicit confirmation: no → don’t trust ; o/w → trust • suboptimal for several reasons related work : restricted version : data : user response analysis : experiment & results : caveats & future work
correction detection • detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow] • machine learning approach • features from different knowledge sources in the system • results fairly good, but not perfect related work : restricted version : data : user response analysis : experiment & results : caveats & future work
integration • confidence annotation and correction detection are useful tools • but separately, neither solves the problem • bridge together in a unified approach to accurately track beliefs related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline • related work • a restricted version • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
belief updating: general form • given: • an initial belief Pinitial(C) over concept C • a system action SA • a user response R • construct an updated belief: • Pupdated(C) ← f (Pinitial(C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work
restricted version: 2 simplifications • compact belief • system unlikely to “hear” more than 3 or 4 values • single vs. multiple recognition results • in our data: max = 3 values, only 6.9% have >1 value • confidence score of top hypothesis • updates after confirmation actions • reduced problem • ConfTopupdated(C) ← f (ConfTopinitial(C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline • related work • a restricted version • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
data • collected with RoomLine • a phone-based mixed-initiative spoken dialog system • conference room reservation • search and negotiation • explicit and implicit confirmations • confidence threshold model (+ some exploration) • unplanned implicit confirmations • I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one? • I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one? related work : restricted version : data : user response analysis : experiment & results : caveats & future work
corpus • user study • 46 participants (naïve users) • 10 scenario-based interactions each • compensated per task success • corpus • 449 sessions, 8848 user turns • orthographically transcribed • rich annotation: correct concepts, corrections, etc. related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline • related work • a restricted version • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
user response types • following Krahmer and Swerts • study on Dutch train-table information system • 3 user response types • YES: yes, right, that’s right, correct, etc. • NO: no, wrong, etc. • OTHER • cross-tabulated against correctness of confirmations related work : restricted version : data : user response analysis : experiment & results : caveats & future work
~10% user responses to explicit confirmations • from transcripts [numbers in brackets from Krahmer&Swerts] • from decoded related work : restricted version : data : user response analysis : experiment & results : caveats & future work
other responses to explicit confirmations • ~70% users repeat the correct value • ~15% users don’t address the question • attempt to shift conversation focus related work : restricted version : data : user response analysis : experiment & results : caveats & future work
user responses to implicit confirmations • Transcripts [numbers in brackets from Krahmer&Swerts] • Decoded related work : restricted version : data : user response analysis : experiment & results : caveats & future work
ignoring errors in implicit confirmations • users correct later (40% of 118) • users interact strategically • correct only if essential related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline • related work • a restricted version • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
machine learning approach • need good probability outputs • low cross-entropy between model predictions and reality • cross-entropy = negative average log posterior • logistic regression • sample efficient • stepwise approach → feature selection • logistic model tree for each action • root splits on response-type related work : restricted version : data : user response analysis : experiment & results : caveats & future work
features. target. • initial situation • initial confidence score • concept identity, dialog state, turn number • system action • other actions performed in parallel • features of the user response • acoustic / prosodic features • lexical features • grammatical features • dialog-level features • target: was the value correct? related work : restricted version : data : user response analysis : experiment & results : caveats & future work
baselines • initial baseline • accuracy of system beliefs before the update • heuristic baseline • accuracy of heuristic rule currently used in the system • oracle baseline • accuracy if we knew exactly when the user is correcting the system related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: explicit confirmation Hard error (%) Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: implicit confirmation Hard error (%) Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
results: unplanned implicit confirmation Hard error (%) Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
informative features • initial confidence score • prosody features • barge-in • expectation match • repeated grammar slots • concept id related work : restricted version : data : user response analysis : experiment & results : caveats & future work
outline • related work • a reduced version. approach • data • user response analysis • experiments and results • some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
eliminate simplification 1 • current restricted version • belief = confidence score of top hypothesis • only 6.9% of cases had more than 1 hypothesis • extend to • Nhypotheses + 1 (other), where N is a small integer (2 or 3) • approach: multinomial generalized linear model • use information from multiple recognition hypotheses related work : restricted version : data : user response analysis : experiment & results : caveats & future work
eliminate simplification 2 • current restricted version • only updates following system confirmation actions • users might correct the system at any point • extend to • updates after all system actions related work : restricted version : data : user response analysis : experiment & results : caveats & future work
misunderstandings non-understandings detection strategies policy shameless self promotion - rejection threshold adaptation - nonu impact on performance [Interspeech-05] - comparative analysis of 10 recovery strategies [SIGdial-05] • wizard experiment • towards learning nonu recovery policies [Sigdial-05]
shameless CMU promotion • Ananlada (Moss) Chotimongkol • automatic concept and task structure acquisition • Antoine Raux • turn-taking, conversation micro-management • Jahanzeb Sherwani • multimodal personal information management • Satanjeev Banerjee • meeting understanding • Stefanie Tomko • universal speech interface • Thomas Harris • multi-participant dialog • DoD / Young Researchers’ Roundtable
a more subtle caveat • distribution of training data • confidence annotator + heuristic update rules • distribution of run-time data • confidence annotator + learned model • always a problem when interacting with the world • hopefully, distribution shift will not cause large degradation in performance • remains to validate empirically • maybe a bootstrap approach?