320 likes | 487 Views
a “k-hypotheses + other” belief updating model. Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. acknowledgements Tim Paek Eric Horvitz Microsoft Research. motivation. spoken language interfaces are still very brittle.
E N D
a “k-hypotheses + other”belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Tim Paek Eric Horvitz Microsoft Research
motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]
/ 0.72 / 0.65 confidence score / 0.35 / 0.58 / 0.28 misunderstandings S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… Chicago arrival = {Seoul / 0.65} Huntsville no no I’m traveling to Birmingham the tenth of August my destination is Birmingham
/ 0.72 arrival = { … } departure = { … } / 0.65 departure = { … } confidence score / 0.35 departure = { … } f / 0.58 arrival = { … } departure = { … } / 0.28 arrival = { … } departure = { … } misunderstandings S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… arrival = {Seoul / 0.65} arrival = ?
arrival = {Seoul / 0.65} arrival = ? f belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R)
outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] / 0.72 current solutions confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} / 0.65 f / 0.35 arrival = ? • track single values • use simple heuristic belief updating rules • explicit confirmations • yes / no • implicit confirmations • new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion
outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion
belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f / 0.35 arrival = ? • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion
YUMA, AZ ALPINE, TX ALPENA, MI ALBANY, NY ABILENE, TX ALLIANCE, NE ABERDEEN, TX ALLAKAKET, AK ALLENTOWN, PA ALEXANDRIA, LA ALBUQUERQUE, NM belief representation Bupdated(C)← f(Binitial(C), SA(C), R) • probability distribution over the set of possible values departure • however • system “hears” only a small number of conflicting values for a concept throughout a session • max = 3 conflicting values heard intro : current solutions : approach : experimental results: global performance : conclusion
departure_city [k=3, m=2, n=1] Austin Houston other Boston S: Did you say you were flying from Austin? U: [NO ASPEN] Boston Austin other Ø Aspen Boston Aspen other belief representation • compressed belief representation • khypotheses + other • dynamically add and drop hypotheses • remember m hypotheses, add n new ones (m+n=k) Bupdated(C)← f(Binitial(C), SA(C), R) S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] • B…(C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results: global performance : conclusion
system action Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion
user response Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion
approach • multinomial regression problem • multinomial generalized linear model • sample efficient • stepwise approach feature selection • one separate model for each system action • Bupdated(C) ← fSA(C)(Binitial(C), R) Bupdated(C) ← f(Binitial(C), SA(C), R) intro : current solutions : approach : experimental results: global performance : conclusion
outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
data • RoomLine • conference room reservations • explicit and implicit confirmations • user study • 46 participants • 10 scenario-based interactions each • corpus • 449 sessions, 8848 user turns • transcribed & annotated • misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion
explicit confirm implicit confirm 30.8 30.3 30% 30% 26.0 21.5 20% 20% 16.1 15.0 initial baseline (i) [error before update] 10% 10% 6.2 5.0 heuristic baseline (h) [error after heuristic update] 0% 0% i h M c i h M c correction baseline (c) [error if we had perfect correction detection] request no action 98.2 79.7 44.8 12% 45% 9.5 8% 30% 5.7 14.8 4% 15% 0% 0% i h M i h M model performance Model (M) [k=2, all features] intro : current solutions : approach : experimental results : global performance : conclusion
outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion
a new user study … • implemented models in the system • 2nd, between-subjects experiment • control: using heuristic update rules • treatment: using belief updating models • 40 participants, non-native users • improvements more likely at high word-error-rates intro : current solutions : approach : experimental results: global performance : conclusion
78% 78% treatment control 64% 30% word error rate 16% word error rate effect on task success • logistic ANOVA on task success p=0.009 logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition 100% 80% probability of task success 60% 40% 20% 0% 0% 20% 40% 60% 80% 100% word error rate intro : current solutions : approach : experimental results: global performance : conclusion
how about efficiency? • ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition • significant improvement • equivalent to 7.9% absolute reduction in word-error p=0.0003 intro : current solutions : approach : experimental results: global performance : conclusion
outline • introduction • current solutions • approach • experimental results • effects on global performance • conclusion and future work intro : current solutions : approach : experimental results: global performance : conclusion
f summary U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago departure = { … } arrival = { … } / 0.72 / 0.65 arrival = {Seoul / 0.65} departure = { … } / 0.35 arrival = ? departure = { … } • approach for constructing accurate beliefs • integrate information across multiple turns • significant gains in task success and efficiency intro : current solutions : approach : experimental results: global performance : conclusion
other advantages • learns from data • tuned to the domain in which it operates • sample efficient / scalable • local one-turn optimization, concepts are independent • RoomLine operates with 29 concepts • cardinality: 2 several hundreds • portable • decoupled from dialog task specification • no assumptions about dialog management intro : current solutions : approach : experimental results: global performance : conclusion
future work • integrate information from n-best list • integrate other high-level knowledge • domain-specific constraints • inter-concept dependencies • investigate technique in other domains intro : current solutions : approach : experimental results: global performance : conclusion
improvements at different WER absolute improvement in task success word-error-rate
user study • 10 scenarios, fixed order • presented graphically (explained during briefing) • participants compensated per task success
informative features • priors and confusability • initial confidence scores • concept identity • barge-in • expectation match • repeated grammar slots
Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k = -15.96 3.61 answer_type[YES] = -12.67 -5.90 answer_type[NO] = 4.55 3.15 answer_type[OTHER] = 1.20 -0.75 concept_id(equip) = 6.96 4.42 i_th_confusability = -3.67 -4.80 ih_diff_lexical_one_word = -15.99 -1.17 lexw1[SMALL] = 17.63 20.26 response_new_hyps_in_selh = 18.85 0.41 END
Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = 0.31 -1.74 mark_disconfirm = 3.39 1.57 i_th_conf = 0.39 -3.63 i_th_confusability = -4.17 -4.54 k = -16.83 3.75 lex[THREE] = -2.25 -2.68 response_new_hyps_in_selh = 20.88 1.70 turn_number = 0.01 0.03 END
Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93 -13.91 dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71 total_num_parses = -1.06 -0.40 ur_selh_new_1_conf = 4.09 1.76 ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98 ur_selh_new_1_prior_>_1 = -1.00 -6.38 END