20 likes | 174 Views
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University. ( infrastructure & architecture ). ( current research ). belief updating. RavenClaw dialog management. 3. 1.
E N D
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ( infrastructure & architecture ) ( current research ) belief updating RavenClaw dialog management 3 1 Bohus and Rudnicky - “Constructing Accurate Beliefs in Spoken Dialog Systems”, in ASRU-2005 • RavenClaw:dialog management framework for complex, task-oriented domains • Dialog Task Specification (DTS): hierarchical plan which captures the domain-specific dialog control logic; • Dialog Engine “executes” a given DTS. • platform for research • error handling [this poster] • multi-participant dialog (Thomas Harris) • turn-taking (Antoine Raux) systems built demo: Roomline • problem:confidence scores provide an initial assessment for the reliability of the information obtained from the user. However, a system should leverage information available in subsequent user responses in order to updateand improve the accuracyof its beliefs. • goal:bridge confidence annotation and correction detection in a unified framework for belief updating in task oriented spoken dialog system • approach: • machine learning (generalized linear models) • integrate features from multiple knowledge sources in the system • work with compressed beliefs • top hypothesis + other [ ASRU-2005 paper] • k hypotheses + other [ work in progress] sample problem • conference room reservations • live schedules for 13 rooms in 2 buildings on campus • size, location, a/v equipment • recognition: sphinx-2 [3-gram] • parsing: phoenix • synthesis: cepstral theta • information access • Let’s Go! Bus Information, RoomLine • guidance through procedures • LARRI, IPA • taskable agent • Vera • command-and-control • TeamTalk S: where would you like to fly from? U: [Boston/0.45]; [Austin/0.30] S: sorry, did you say you wanted to fly from Boston? U: [No/0.37] + [Aspen / 0.7] Updated belief = ? [Boston/?; Austin/?; Aspen/?] results: data driven approach significantlyoutperforms common heuristics error rates 30% 30% initial RavenClaw error handling architecture 20% 20% 2 heuristic 10% 10% proposed • goal:task-independent, adaptive and scalable error handling architecture • approach: • error handling strategies and error handling decision process are decoupled from the dialog task → reusability, uniformity, plug-and-play strategies, lessens development effort • error handling decision process implemented in a distributed fashion • local concepterror handling decision process (handles potential misunderstandings) • local request error handling decision process (handles non-understandings) • currently implemented as POMDPs (for concepts) and MDPs (for request agents) updates followingexplicit confirmation updates followingimplicit confirmation oracle misunderstanding recovery strategies learning policies for recovering from non-understandings 4 Explicit Confirmation Did you say you wanted a room on Friday? Implicit Confirmation a room on Friday … for what time? Bohus and Rudnicky - “Sorry, I didn’t Catch That! – an Investigation of Non-understanding Errors and Recovery Strategies”, in SIGdial-2005 • question:can dialog performance be improved by using a better, more informed policy for engaging non-understanding recovery strategies? • approach: a between-groups experiment • control group: system chooses a non-understanding strategy randomly • (i.e. in an uninformed fashion); • wizard group: a human wizard chooses which strategy should be used whenever a non-understanding happens; • 23 participants in each condition • first-time users, balanced by gender x native language • each attempted a maximum of 10 scenario-based interactions • evaluated global dialog performance (task success) and various local non-understanding recovery performance metrics (see side panel) results: wizard policy outperforms uninformed recovery policy on a number of global and local metrics * 80% * * 100% non-understanding recovery strategies wizard policy 80% 60% uninformed policy 60% RoomLine 40% 40% 20% AskRepeat Can you please repeat that? AskRephrase Could you please try to rephrase that? Reprompt Would you like a small or a large room? DetailedReprompt Sorry, I’m not sure I understood you correctly. Right now I need to know if you would prefer a small or a large room. Notify Sorry, I didn’t catch that … Yield Ø MoveOn Sorry, I did’t catch that. Once choice would be Wean Hall 7220. Would you like a reserva- tion for this room? YouCanSay Sorry, I didn’t catch that. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. TerseYouCanSay Full-Help Sorry, I didn’t catch that. So far I found 5 rooms matching your constraints. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. 20% i:Welcome GetQuery x: DoQuery DiscussResults non-natives natives non-natives natives avg. task success rate avg. recovery WER * 1 * 5 date end_time 4 1 r:GetDate r:GetEndTime 3 concept error handling MDP 0 2 concept error handling MDP -1 start_time 1 r:GetStartTime non-natives natives non-natives natives avg. recovery conceptutility avg. recovery efficiency request error handling MDP request error handling MDP predicting likelihood of success • question:can we learn a better policy from data? • decision theoretic approach: • learn to predict likelihood of success for each strategy • use features available at runtime • stepwise logistic regression (good class posterior probabilities) • compute expected utility for each strategy • choose strategy with maximum expected utility • preliminary results promising, a new experiment needed for validation for 5 of 10 strategies models perform better than a majority baseline, on both soft and hard error dialog task specification majority baseline error → cross-validation error dialog engine start_time: [start_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] error handling decision process ExplConf(start_time) date: [date] start_time: [start_time] [time] end_time: [end_time] [time] location: [location] network: [with_network] → true [without_network] → false GetStartTime GetQuery error handling strategies RoomLine rejection threshold optimization transfer of confidence annotators across domains expectation agenda dialog stack 5 6 System: User: Parse: System: For when do you need the room? Let’s try two to four p.m. [time](two) [end_time](four) Did you say you wanted the room starting at two p.m.? work in progress, in collaboration with Antoine Raux Bohus and Rudnicky - “A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems”, to be presented at Interspeech • error handling strategies are implemented as library dialog agents • new strategies can be plugged in as they are developed • data-driven approach for tuning state-specific rejection thresholds in a spoken dialog system • migrate (adapt) a confidence annotator trained with data from domain A to domain B, without any labeled data in the new domain (B).