1 / 32

a “k-hypotheses + other” belief updating model

a “k-hypotheses + other” belief updating model. Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. acknowledgements Tim Paek Eric Horvitz Microsoft Research. motivation. spoken language interfaces are still very brittle.

gbeachum
Download Presentation

a “k-hypotheses + other” belief updating model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a “k-hypotheses + other”belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Tim Paek Eric Horvitz Microsoft Research

  2. motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]

  3. / 0.72 / 0.65 confidence score / 0.35 / 0.58 / 0.28 misunderstandings S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… Chicago  arrival = {Seoul / 0.65} Huntsville  no no I’m traveling to Birmingham  the tenth of August  my destination is Birmingham 

  4. / 0.72 arrival = { … } departure = { … } / 0.65 departure = { … } confidence score / 0.35 departure = { … } f / 0.58 arrival = { … } departure = { … } / 0.28 arrival = { … } departure = { … } misunderstandings S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… arrival = {Seoul / 0.65} arrival = ?

  5. arrival = {Seoul / 0.65} arrival = ? f belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R)

  6. outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

  7. detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] / 0.72 current solutions confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} / 0.65 f / 0.35 arrival = ? • track single values • use simple heuristic belief updating rules • explicit confirmations • yes / no • implicit confirmations • new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion

  8. outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion

  9. belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f / 0.35 arrival = ? • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion

  10. YUMA, AZ ALPINE, TX ALPENA, MI ALBANY, NY ABILENE, TX ALLIANCE, NE ABERDEEN, TX ALLAKAKET, AK ALLENTOWN, PA ALEXANDRIA, LA ALBUQUERQUE, NM belief representation Bupdated(C)← f(Binitial(C), SA(C), R) • probability distribution over the set of possible values departure • however • system “hears” only a small number of conflicting values for a concept throughout a session • max = 3 conflicting values heard intro : current solutions : approach : experimental results: global performance : conclusion

  11. departure_city [k=3, m=2, n=1] Austin Houston other Boston S: Did you say you were flying from Austin? U: [NO ASPEN] Boston Austin other Ø Aspen Boston Aspen other belief representation • compressed belief representation • khypotheses + other • dynamically add and drop hypotheses • remember m hypotheses, add n new ones (m+n=k) Bupdated(C)← f(Binitial(C), SA(C), R) S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] • B…(C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results: global performance : conclusion

  12. system action Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion

  13. user response Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion

  14. approach • multinomial regression problem • multinomial generalized linear model • sample efficient • stepwise approach  feature selection • one separate model for each system action • Bupdated(C) ← fSA(C)(Binitial(C), R) Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion

  15. outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

  16. data • RoomLine • conference room reservations • explicit and implicit confirmations • user study • 46 participants • 10 scenario-based interactions each • corpus • 449 sessions, 8848 user turns • transcribed & annotated • misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion

  17. explicit confirm implicit confirm 30.8 30.3 30% 30% 26.0 21.5 20% 20% 16.1 15.0 initial baseline (i) [error before update] 10% 10% 6.2 5.0 heuristic baseline (h) [error after heuristic update] 0% 0% i h M c i h M c correction baseline (c) [error if we had perfect correction detection] request no action 98.2 79.7 44.8 12% 45% 9.5 8% 30% 5.7 14.8 4% 15% 0% 0% i h M i h M model performance Model (M) [k=2, all features] intro : current solutions : approach : experimental results : global performance : conclusion

  18. outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion

  19. a new user study … • implemented models in the system • 2nd, between-subjects experiment • control: using heuristic update rules • treatment: using belief updating models • 40 participants, non-native users • improvements more likely at high word-error-rates intro : current solutions : approach : experimental results: global performance : conclusion

  20. 78% 78% treatment control 64% 30% word error rate 16% word error rate effect on task success • logistic ANOVA on task success p=0.009 logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition 100% 80% probability of task success 60% 40% 20% 0% 0% 20% 40% 60% 80% 100% word error rate intro : current solutions : approach : experimental results: global performance : conclusion

  21. how about efficiency? • ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition • significant improvement • equivalent to 7.9% absolute reduction in word-error p=0.0003 intro : current solutions : approach : experimental results: global performance : conclusion

  22. outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion

  23. f summary U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago departure = { … } arrival = { … } / 0.72 / 0.65 arrival = {Seoul / 0.65} departure = { … } / 0.35 arrival = ? departure = { … } • approach for constructing accurate beliefs • integrate information across multiple turns • significant gains in task success and efficiency intro : current solutions : approach : experimental results: global performance : conclusion

  24. other advantages • learns from data • tuned to the domain in which it operates • sample efficient / scalable • local one-turn optimization, concepts are independent • RoomLine operates with 29 concepts • cardinality: 2  several hundreds • portable • decoupled from dialog task specification • no assumptions about dialog management intro : current solutions : approach : experimental results: global performance : conclusion

  25. future work • integrate information from n-best list • integrate other high-level knowledge • domain-specific constraints • inter-concept dependencies • investigate technique in other domains intro : current solutions : approach : experimental results: global performance : conclusion

  26. thank you! questions …

  27. improvements at different WER absolute improvement in task success word-error-rate

  28. user study • 10 scenarios, fixed order • presented graphically (explained during briefing) • participants compensated per task success

  29. informative features • priors and confusability • initial confidence scores • concept identity • barge-in • expectation match • repeated grammar slots

  30. Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k = -15.96 3.61 answer_type[YES] = -12.67 -5.90 answer_type[NO] = 4.55 3.15 answer_type[OTHER] = 1.20 -0.75 concept_id(equip) = 6.96 4.42 i_th_confusability = -3.67 -4.80 ih_diff_lexical_one_word = -15.99 -1.17 lexw1[SMALL] = 17.63 20.26 response_new_hyps_in_selh = 18.85 0.41 END

  31. Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = 0.31 -1.74 mark_disconfirm = 3.39 1.57 i_th_conf = 0.39 -3.63 i_th_confusability = -4.17 -4.54 k = -16.83 3.75 lex[THREE] = -2.25 -2.68 response_new_hyps_in_selh = 20.88 1.70 turn_number = 0.01 0.03 END

  32. Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93 -13.91 dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71 total_num_parses = -1.06 -0.40 ur_selh_new_1_conf = 4.09 1.76 ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98 ur_selh_new_1_prior_>_1 = -1.00 -6.38 END

More Related