1 / 34

online supervised learning of non-understanding recovery policies

online supervised learning of non-understanding recovery policies. Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi. ?.

liona
Download Presentation

online supervised learning of non-understanding recovery policies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. online supervised learning of non-understanding recovery policies Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi

  2. ? ? ? S: S: • Did you say Berlin? • from Berlin … where to? • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … understanding-errors in spoken dialog MIS-understanding NON-understanding System constructs an incorrect semantic representation of the user’s turn System fails to construct a semantic representation of the user’s turn S: Where are you flying from? U: Birmingham [BERLIN PM] S: Where are you flying from? U: Urbana Champaign [OKAY IN THAT SAME PAY]

  3. recovery strategies • large set of strategies (“strategy” = 1-step action) • tradeoffs not well understood • some strategies are more appropriate at certain times • OOV -> ask repeat is not a good idea • door slam -> ask repeat might work well • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … S:

  4. recovery policy • “policy” = method for choosing between strategies • difficult to handcraft • especially over a large set of recovery strategies • common approaches • heuristic • “three strikes and you’re out” [Balentine] • 1st non-understanding: ask user to repeat • 2nd non-understanding: provide more help, including examples • 3rd non-understanding: transfer to an operator

  5. this talk … … an online, supervised method for learning a non-understanding recovery policy from data

  6. overview • introduction • approach • experimental setup • results • discussion

  7. overview • introduction • approach • experimental setup • results • discussion

  8. intuition … … if we knew the probability of success for each strategy in the current situation, we could easily construct a policy S: Where are you flying from? U: [OKAY IN THAT SAME PAY] Urbana Champaign S: • Sorry, I didn’t catch that … • Can you repeat that? • Can you rephrase that? • Where are you flying from? • Please tell me the name of the city you are leaving from … • Could you please go to a quieter place? • Sorry, I didn’t catch that … tell me the state first … 32% 15% 20% 30% 45% 25% 43%

  9. two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)

  10. learning predictors for strategy success • supervised learning: logistic regression • target: strategy recovery successfully or not • “success” = next turn is correctly understood • labeled semi-automatically • features: describe current situation • extracted from different knowledge sources • recognition features • language understanding features • dialog-level features [state, history]

  11. logistic regression • well-calibrated class-posterior probabilities • predictions reflect empirical probability of success • x% of cases where P(S|F)=x are indeed successful • sample efficient • one model per strategy, so data will be sparse • stepwise construction • automatic feature selection • provide confidence bounds • very useful for online learning

  12. two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)

  13. policy learning • choose strategy most likely to succeed 1 0 S1 S2 S3 S4 • BUT: • we want to learn online • we have to deal with the exploration / exploitation tradeoff

  14. highest-upper-bound learning • choose strategy with highest-upper-bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation

  15. highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation

  16. highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation

  17. highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation

  18. highest-upper-bound learning • choose strategy with highest upper bound • proposed by [Kaelbling 93] • empirically shown to do well in various problems • intuition 1 1 0 0 S1 S2 S3 S4 S1 S2 S3 S4 exploration exploitation

  19. overview • introduction • approach • experimental setup • results • discussion

  20. system • Let’s Go! Public bus information system • connected to PAT customer service line during non-business hours • ~30-50 calls / night

  21. strategies

  22. constraints • constraints • don’t AREP more than twice in a row • don’t ARPH if #words <= 3 • don’t ASA unless #words > 5 • don’t ASO unless (4 nonu in a row) and (ratio.nonu > 50%) • don’t GUP unless (dialog > 30 turns) and (ratio.nonu > 80%) • capture expert knowledge; ensure system doesn’t use an unreasonable policy • 4.2/11 strategies available on average • min=1, max=9

  23. features • current non-understanding • recognition, lexical, grammar, timing info • current non-understanding segment • length, which strategies already taken • current dialog state and history • encoded dialog states • “how good things have been going”

  24. learning • baseline period [2 weeks, 3/11 -> 3/25, 2006] • system randomly chose a strategy, while obeying constraints • in effect, a heuristic / stochastic policy • learning period [5 weeks, 3/26 -> 5/5, 2006] • each morning labeled data from previous night • retrained likelihood of success predictors • installed in the system for the next night

  25. 2 strategies eliminated

  26. overview • introduction • approach • experimental setup • results • discussion

  27. results • average non-understanding recovery rate (ANNR) • improvement: 33.6%  37.8% (p=0.03) (12.5%rel) • fitted learning curve: A = 0.3385 B = 0.0470 C = 0.5566 D = -11.44

  28. MOVE ASA HLP IT RP HLP_R SLL ARPH AREP policy evolution • MOVE, HLP, ASA engaged more often • AREP, ARPH engaged less often

  29. overview • introduction • approach • experimental setup • results • discussion

  30. are the predictors learning anything? • AREP(653), IT(273), SLL(300) • no informative features • ARPH(674), MOVE(1514) • 1 informative feature (#prev.nonu, #words) • ASA(637), RP(2532), HLP(3698), HLP_R(989) • 4 or more informative features in the model • dialog state (especially explicit confirm states) • dialog history

  31. more features, more (specific) strategies • more features would be useful • day-of-week • clustered dialog states • ? (any ideas?) ? • more strategies / variants • approach might be able to filter out bad versions • more specific strategies, features • ask short answers worked well … • speak less loud didn’t … (why?)

  32. “noise” in the experiment • ~15-20% of responses following non-understandings are non-user-responses • transient noises • secondary speech • primary speech not directed to the system • this might affect training, in a future experiment we want to eliminate that

  33. unsupervised learning • supervised version • “success” = next turn is correctly understood[i.e. no misunderstanding, no non-understanding] • unsupervised version • “success” = next turn is not a non-understanding • “success” = confidence score of next turn • training labels automatically available • performance improvements might still be possible

  34. thank you!

More Related