Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ( infrastructure & architecture ) ( current research ) belief updating RavenClaw dialog management 3 1 Bohus and Rudnicky - “Constructing Accurate Beliefs in Spoken Dialog Systems”, in ASRU-2005 • RavenClaw:dialog management framework for complex, task-oriented domains • Dialog Task Specification (DTS): hierarchical plan which captures the domain-specific dialog control logic; • Dialog Engine “executes” a given DTS. • platform for research • error handling [this poster] • multi-participant dialog (Thomas Harris) • turn-taking (Antoine Raux) systems built demo: Roomline • problem:confidence scores provide an initial assessment for the reliability of the information obtained from the user. However, a system should leverage information available in subsequent user responses in order to updateand improve the accuracyof its beliefs. • goal:bridge confidence annotation and correction detection in a unified framework for belief updating in task oriented spoken dialog system • approach: • machine learning (generalized linear models) • integrate features from multiple knowledge sources in the system • work with compressed beliefs • top hypothesis + other [ ASRU-2005 paper] • k hypotheses + other [ work in progress] sample problem • conference room reservations • live schedules for 13 rooms in 2 buildings on campus • size, location, a/v equipment • recognition: sphinx-2 [3-gram] • parsing: phoenix • synthesis: cepstral theta • information access • Let’s Go! Bus Information, RoomLine • guidance through procedures • LARRI, IPA • taskable agent • Vera • command-and-control • TeamTalk S: where would you like to fly from? U: [Boston/0.45]; [Austin/0.30] S: sorry, did you say you wanted to fly from Boston? U: [No/0.37] + [Aspen / 0.7] Updated belief = ? [Boston/?; Austin/?; Aspen/?] results: data driven approach significantlyoutperforms common heuristics error rates 30% 30% initial RavenClaw error handling architecture 20% 20% 2 heuristic 10% 10% proposed • goal:task-independent, adaptive and scalable error handling architecture • approach: • error handling strategies and error handling decision process are decoupled from the dialog task → reusability, uniformity, plug-and-play strategies, lessens development effort • error handling decision process implemented in a distributed fashion • local concepterror handling decision process (handles potential misunderstandings) • local request error handling decision process (handles non-understandings) • currently implemented as POMDPs (for concepts) and MDPs (for request agents) updates followingexplicit confirmation updates followingimplicit confirmation oracle misunderstanding recovery strategies learning policies for recovering from non-understandings 4 Explicit Confirmation Did you say you wanted a room on Friday? Implicit Confirmation a room on Friday … for what time? Bohus and Rudnicky - “Sorry, I didn’t Catch That! – an Investigation of Non-understanding Errors and Recovery Strategies”, in SIGdial-2005 • question:can dialog performance be improved by using a better, more informed policy for engaging non-understanding recovery strategies? • approach: a between-groups experiment • control group: system chooses a non-understanding strategy randomly • (i.e. in an uninformed fashion); • wizard group: a human wizard chooses which strategy should be used whenever a non-understanding happens; • 23 participants in each condition • first-time users, balanced by gender x native language • each attempted a maximum of 10 scenario-based interactions • evaluated global dialog performance (task success) and various local non-understanding recovery performance metrics (see side panel) results: wizard policy outperforms uninformed recovery policy on a number of global and local metrics * 80% * * 100% non-understanding recovery strategies wizard policy 80% 60% uninformed policy 60% RoomLine 40% 40% 20% AskRepeat Can you please repeat that? AskRephrase Could you please try to rephrase that? Reprompt Would you like a small or a large room? DetailedReprompt Sorry, I’m not sure I understood you correctly. Right now I need to know if you would prefer a small or a large room. Notify Sorry, I didn’t catch that … Yield Ø MoveOn Sorry, I did’t catch that. Once choice would be Wean Hall 7220. Would you like a reserva- tion for this room? YouCanSay Sorry, I didn’t catch that. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. TerseYouCanSay Full-Help Sorry, I didn’t catch that. So far I found 5 rooms matching your constraints. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. 20% i:Welcome GetQuery x: DoQuery DiscussResults non-natives natives non-natives natives avg. task success rate avg. recovery WER * 1 * 5 date end_time 4 1 r:GetDate r:GetEndTime 3 concept error handling MDP 0 2 concept error handling MDP -1 start_time 1 r:GetStartTime non-natives natives non-natives natives avg. recovery conceptutility avg. recovery efficiency request error handling MDP request error handling MDP predicting likelihood of success • question:can we learn a better policy from data? • decision theoretic approach: • learn to predict likelihood of success for each strategy • use features available at runtime • stepwise logistic regression (good class posterior probabilities) • compute expected utility for each strategy • choose strategy with maximum expected utility • preliminary results promising, a new experiment needed for validation for 5 of 10 strategies models perform better than a majority baseline, on both soft and hard error dialog task specification majority baseline error → cross-validation error dialog engine start_time: [start_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] error handling decision process ExplConf(start_time) date: [date] start_time: [start_time] [time] end_time: [end_time] [time] location: [location] network: [with_network] → true [without_network] → false GetStartTime GetQuery error handling strategies RoomLine rejection threshold optimization transfer of confidence annotators across domains expectation agenda dialog stack 5 6 System: User: Parse: System: For when do you need the room? Let’s try two to four p.m. [time](two) [end_time](four) Did you say you wanted the room starting at two p.m.? work in progress, in collaboration with Antoine Raux Bohus and Rudnicky - “A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems”, to be presented at Interspeech • error handling strategies are implemented as library dialog agents • new strategies can be plugged in as they are developed • data-driven approach for tuning state-specific rejection thresholds in a spoken dialog system • migrate (adapt) a confidence annotator trained with data from domain A to domain B, without any labeled data in the new domain (B).

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department

Presentation Transcript

The Computer Science Department

COMPUTER SCIENCE DEPARTMENT

Error Handling

Computer Science Department

RavenClaw

Error Handling

ERROR HANDLING

Computer Science Department

computer science department

Computer Science Department

Error handling

Computer Science Department

Computer Science Department

Computer Science Department

Computer Science Department

Computer Science Department

Error Handling

Error Handling

Error Handling

Computer Science Department