310 likes | 442 Views
A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces. Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. problem.
E N D
A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
problem spoken language interfaces lack robustness when faced with understanding errors.
more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
problem source • stems mostly from speech recognition • spans most domains and interaction types • exacerbated by operating conditions • spontaneous speech • medium / large vocabularies • large, varied, and changing user populations
speech recognition impact • typical word-error-rates • 10-20% for natives (novice users) • 40% and above for non-native users • significant negative impact on performance[Walker, Sanders] task success word-error-rate
approaches for increasing robustness • gracefully handle errors through interaction • fix recognition • detect the problems • develop a set of recovery strategies • know how to choose between them (policy) a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NON understanding MIS understanding non- and misunderstandings S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……… a closer look : RL in spoken dialog systems : current challenges : RL for error handling
misunderstandings non-understandings detection strategies confidence threshold model policy implicit explicit 0 1 reject accept six not-so-easy pieces recognition or semantic confidence scores typically trivial [some exceptions may apply] explicit confirmation Did you say 10am? implicit confirmation Starting at 10am… until what time? accept, reject Sorry, I didn’t catch that … Can you repeat that? Can you rephrase that? You can say something like “at 10 a.m.” [MoveOn] Handcrafted heuristics first notify, then ask repeat, then give help, then give up a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling
Speech Recognition Speech Synthesis spoken dialog system architecture LanguageUnderstanding Dialog Manager Domain Back-end Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling
Speech Recognition Speech Synthesis reinforcement learning in dialog systems • debate over design choices • learn choices using reinforcement learning • agent interacting with an environment • noisy inputs • temporal / sequential aspect • task success / failure LanguageUnderstanding noisy semantic input Dialog Manager Domain Back-end actions (semantic output) Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun • “Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System” [Singh, Litman, Kearns, Walker] • provides information about “fun things to do in New Jersey” • slot-filling dialog • type-of-activity • location • time • provide information from a database a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP • define state-space • define action-space • define reward structure • collect data for training & learn policy • evaluate learned policy a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP: state-space • internal system state: 14 variables • state for RL → vector of 7 variables • greet: has the system greeted the user • attribute: which attribute the system is currently querying • confidence: recognition confidence level (binned) • value: value has been obtained for current attribute • tries: how many times the current attribute was asked • grammar: non-restrictive or restrictive grammar was used • history: was there any trouble on previous attributes • 62 different states a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP: actions & rewards • type of initiative (3 types) • system initiative • mixed initiative • user initiative • confirmation strategy (2 types) • explicit confirmation • no confirmation • resulting MDP has only 2 action choices / state • reward: binary task success a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP: learning a policy • training data: 311 complete dialogs • collected using exploratory policy • learned the policy using value iteration • begin with user initiative • back-off to mixed or system initiative when re-asking for an attribute • specific type of back-off is different for different attributes • confirm when confidence is low a closer look : RL in spoken dialog systems : current challenges : RL for error handling
NJFun as an MDP: evaluation • evaluated policy on 124 testing dialogs • task success rate: 52% → 64% • weak task completion: 1.72 → 2.18 • subjective evaluation: no significant improvements, but move-to-the-mean effect • learned policy better than hand-crafted policies • comparatively evaluated policies on learned MDP a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling
challenge 1: scalability • contrast NJFun with RoomLine • conference room reservation and scheduling • mixed-initiative task-oriented interaction • system obtains list or rooms matching initial constraints • system negotiates with user to identify room that best matches their needs • 37 concepts (slots), 25 questions that can be asked • another example: LARRI • full-blown MDP is intractable • not clear how to do state-abstraction a closer look : RL in spoken dialog systems : current challenges : RL for error handling
challenge 2: reusability • underlying MDP is system-specific • MDP design still requires a lot of human expertise • new MDP for each system • new training & new evaluation • are we really saving time & expertise? • maybe we’re asking for too much? a closer look : RL in spoken dialog systems : current challenges : RL for error handling
addressing the scalability problem • approach 1: user models / simulations • costly to obtain real data → simulate • simplistic simulators [Eckert, Levin] • more complex, task-specific simulators [Scheffler & Young] • real-world evaluation becomes paramount • approach 2: value function approximation • data-driven state abstraction / state aggregation [Denecke] a closer look : RL in spoken dialog systems : current challenges : RL for error handling
outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling
Speech Recognition Speech Synthesis reinforcement learning in dialog systems • Focus RL only on the difficult decisions! LanguageUnderstanding semantic input Dialog Manager Domain Back-end actions / semantic output Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling
error handling decisions • domain-specific dialog control decisions task-decoupled approach • decouple • use reinforcement learning • use your favorite DM framework • advantages • reduces the size of the learning problem • favors reusability of learned policies • lessens system authoring effort a closer look : RL in spoken dialog systems : current challenges : RL for error handling
RoomLine user_name results query registered Login GetQuery GetResults DiscussResults Welcome GreetUser DateTime Location Properties AskRegistered AskName Network Projector Whiteboard registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] ExplicitConfirm Error Handling Decision Process registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] AskRegistered ErrorIndicators Login RoomLine Strategies Dialogue Stack Expectation Agenda RavenClaw Dialogue Task (Specification) Domain-Independent Dialogue Engine a closer look : RL in spoken dialog systems : current challenges : RL for error handling
Explicit Confirmation Topic-MDP No Action user_name registered Topic-MDP No Action Concept-MDP Concept-MDP Explicit Confirm No Action decision process architecture • Small-size models • Parameters can be tied across models • Accommodate dynamic task generation RoomLine Login Welcome GreetUser Gating Mechanism AskRegistered AskName • Favors reusability of policies • Initial policies can be easily handcrafted • Independence assumption a closer look : RL in spoken dialog systems : current challenges : RL for error handling
reward structure & learning • Rewards based on any dialogue performance metric • Atypical, multi-agent reinforcement learning setting Local rewards Global, post-gate rewards Reward Action Action Gating Mechanism Gating Mechanism Reward Reward Reward MDP MDP MDP MDP MDP MDP • Multiple, standard RL problems • Risk solving local problems, but not the global one a closer look : RL in spoken dialog systems : current challenges : RL for error handling
conclusion • reinforcement learning – very appealing approach for dialog control • in practical systems, scalability is a big issue • how to leverage knowledge we have? • state-space design • solutions that account or handle sparse data • bounds on policies • hierarchical models
Structure of Individual MDPs • Concept MDPs • State-space: belief indicators • Action-space: concept scoped system actions ExplConf ExplConf ExplConf ImplConf ImplConf ImplConf LC MC HC NoAct NoAct NoAct NoAct 0 • Topic MDPs • State-space: non-understanding, dialogue-on-track indicators • Action-space: non-understanding actions, topic-level actions