340 likes | 530 Views
online supervised learning of non-understanding recovery policies. Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi. ?.
E N D
online supervised learning of non-understanding recovery policies Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi
? ? ? S: S: • Did you say Berlin? • from Berlin … where to? • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … understanding-errors in spoken dialog MIS-understanding NON-understanding System constructs an incorrect semantic representation of the user’s turn System fails to construct a semantic representation of the user’s turn S: Where are you flying from? U: Birmingham [BERLIN PM] S: Where are you flying from? U: Urbana Champaign [OKAY IN THAT SAME PAY]
recovery strategies • large set of strategies (“strategy” = 1-step action) • tradeoffs not well understood • some strategies are more appropriate at certain times • OOV -> ask repeat is not a good idea • door slam -> ask repeat might work well • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … S:
recovery policy • “policy” = method for choosing between strategies • difficult to handcraft • especially over a large set of recovery strategies • common approaches • heuristic • “three strikes and you’re out” [Balentine] • 1st non-understanding: ask user to repeat • 2nd non-understanding: provide more help, including examples • 3rd non-understanding: transfer to an operator
this talk … … an online, supervised method for learning a non-understanding recovery policy from data
overview • introduction • approach • experimental setup • results • discussion
overview • introduction • approach • experimental setup • results • discussion
intuition … … if we knew the probability of success for each strategy in the current situation, we could easily construct a policy S: Where are you flying from? U: [OKAY IN THAT SAME PAY] Urbana Champaign S: • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … 32% 15% 20% 30% 45% 25% 43%
two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)
learning predictors for strategy success • supervised learning: logistic regression • target: strategy recovery successfully or not • “success” = next turn is correctly understood • labeled semi-automatically • features: describe current situation • extracted from different knowledge sources • recognition features • language understanding features • dialog-level features [state, history]
logistic regression • well-calibrated class-posterior probabilities • predictions reflect empirical probability of success • x% of cases where P(S|F)=x are indeed successful • sample efficient • one model per strategy, so data will be sparse • stepwise construction • automatic feature selection • provide confidence bounds • very useful for online learning
two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)
policy learning • choose strategy most likely to succeed 1 0 S1 S2 S3 S4 • BUT: • we want to learn online • we have to deal with the exploration / exploitation tradeoff
highest-upper-bound learning • choose strategy with highest-upper-bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation
highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation
highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation
highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation
highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation
overview • introduction • approach • experimental setup • results • discussion
system • Let’s Go! Public bus information system • connected to PAT customer service line during non-business hours • ~30-50 calls / night
constraints • constraints • don’t AREP more than twice in a row • don’t ARPH if #words <= 3 • don’t ASA unless #words > 5 • don’t ASO unless (4 nonu in a row) and (ratio.nonu > 50%) • don’t GUP unless (dialog > 30 turns) and (ratio.nonu > 80%) • capture expert knowledge; ensure system doesn’t use an unreasonable policy • 4.2/11 strategies available on average • min=1, max=9
features • current non-understanding • recognition, lexical, grammar, timing info • current non-understanding segment • length, which strategies already taken • current dialog state and history • encoded dialog states • “how good things have been going”
learning • baseline period [2 weeks, 3/11 -> 3/25, 2006] • system randomly chose a strategy, while obeying constraints • in effect, a heuristic / stochastic policy • learning period [5 weeks, 3/26 -> 5/5, 2006] • each morning labeled data from previous night • retrained likelihood of success predictors • installed in the system for the next night
overview • introduction • approach • experimental setup • results • discussion
results • average non-understanding recovery rate (ANNR) • improvement: 33.6% 37.8% (p=0.03) (12.5%rel) • fitted learning curve: A = 0.3385 B = 0.0470 C = 0.5566 D = -11.44
MOVE ASA HLP IT RP HLP_R SLL ARPH AREP policy evolution • MOVE, HLP, ASA engaged more often • AREP, ARPH engaged less often
overview • introduction • approach • experimental setup • results • discussion
are the predictors learning anything? • AREP(653), IT(273), SLL(300) • no informative features • ARPH(674), MOVE(1514) • 1 informative feature (#prev.nonu, #words) • ASA(637), RP(2532), HLP(3698), HLP_R(989) • 4 or more informative features in the model • dialog state (especially explicit confirm states) • dialog history
more features, more (specific) strategies • more features would be useful • day-of-week • clustered dialog states • ? (any ideas?) ? • more strategies / variants • approach might be able to filter out bad versions • more specific strategies, features • ask short answers worked well … • speak less loud didn’t … (why?)
“noise” in the experiment • ~15-20% of responses following non-understandings are non-user-responses • transient noises • secondary speech • primary speech not directed to the system • this might affect training, in a future experiment we want to eliminate that
unsupervised learning • supervised version • “success” = next turn is correctly understood[i.e. no misunderstanding, no non-understanding] • unsupervised version • “success” = next turn is not a non-understanding • “success” = confidence score of next turn • training labels automatically available • performance improvements might still be possible