1 / 53

Statisical Spoken Dialogue System Talk 2 – Belief tracking

Statisical Spoken Dialogue System Talk 2 – Belief tracking. CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department brmt2@eng.cam.ac.uk http://mi.eng.cam.ac.uk/~brmt2. Human-machine spoken dialogue. inform(type=restaurant). I want a restaurant. Recognizer.

Download Presentation

Statisical Spoken Dialogue System Talk 2 – Belief tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Statisical Spoken Dialogue SystemTalk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department brmt2@eng.cam.ac.ukhttp://mi.eng.cam.ac.uk/~brmt2

  2. Human-machine spoken dialogue inform(type=restaurant) I want a restaurant Recognizer Semantic Decoder Dialog Manager User Dialog Acts Waveforms Words Synthesizer Message Generator What kind of food do you want.? request(food) Typical structure of a spoken dialogue system

  3. Spoken Dialogue Systems – State of the art

  4. Outline • Introduction • An example user model (spoken dialogue model) • The Partially Observable Markov Decision Process (POMDP) • POMDP models for dialogue systems • POMDP models for off-line experiments • POMDP models for simulating • Inference • Belief propagation (Fixed parameters) • Expectation Propagation (Learning parameters) • Optimisations • Results

  5. Intro – An example user model? • Partially Observable Markov Decision Process (POMDP) • Probabilistic model of what the user will say • Variables: • Dialogue state, st. (e.g. User wants a restaurant) • System action, at. (e.g. “What type of food”) • Observation of what was said, ot. (e.g. N-best semantic list) • Assumes Input-Output Hidden Markov structure: s1 s2 sT ... a1 a2 aT ... o1 o2 oT ...

  6. Intro – Simplifying the POMDP user model • Typically split dialogue state, st: a1 a2 aT ... s1 s2 sT ... o1 o2 oT ...

  7. Intro – Simplifying the POMDP user model • Typically split dialogue state, st: • True user goal, gt • True user act, ut a1 a2 aT ... g1 g2 gT ... u1 u2 uT ... o1 o2 oT ...

  8. Intro – Simplifying the POMDP user model • Further split the goal, gt, into sub-goals gt,c • e.g. User wants a Chinese restaurant  food=Chinese, type=restaurant g gt,type gt,area gt gt,stars gt,food

  9. Intro – Simplifying the POMDP user model G G’ gtype g’type gfood g’food a a’ U U’ utype ufood u’type u’food o o’

  10. Intro – POMDP models for dialogue systems • How can I help you? • I’m looking for a beer [0.5] • I’m looking for a bar [0.4] • Sorry, what did you say? • bar [0.3] • bye [0.3] • When decisions are based on probabilistic user goals: Partially Observable Markov Decision Process (POMDPs) Beer Bar Bye Beer Bar Bye

  11. Intro – POMDP models for dialogue systems

  12. Intro – belief model for dialogue systems Choose actions according to beliefs in the goal instead of most likely hypothesis More robust – some key reasons • Full hypothesis list • User model confirm(beer) Beer Bar Bye

  13. Intro – POMDP models for off-line experiments • How can I help you? • I’m looking for a beer • I’m looking for a bar • Sorry, what did you say? • bar • bye [0.5] [0.4] [0.2] [0.7] Beer Bar Bye [0.3] [0.3] [0.5] [0.1] Beer Bar Bye

  14. Intro – POMDP models for simulation • Often useful to be able to simulate how people behave: • For reinforcement learning • For testing a given system • In theory, simply generate from the POMDP user model G restaurant gtype gfood Chinese a U utype ufood silence() inform(type=restaurant)

  15. An example – voicemail • We have a voicemail system with 2 possible user goals: g = SAVE: The user wants to save g = DEL: The user wants to delete • In each turn until we save/delete we observe one of two things o = USAVE: The user said save o = UDEL: The user said delete • We assume that the goal changes between each turn, and for the moment we only look at two turns • We start by being completely unsure what the user wants

  16. An example – exercise • Observation probability: P(o | g) • If we observe the user saying they want to save and then what is the probability they want to save. P(g1 | o1 = OSAVE) • Use Bayes Theorem – P(A|B) = P(B|A) P(A) / P(B)

  17. An example – exercise • Observation probability: P(o | g) • Transition probability: P(g’ | g) • If we observe the user saying they want to save and then saying they want to delete, what is the probability they want to save in the second turn. i.e. what is: P(g2 | o1 = OSAVE, o2 = ODEL)

  18. An example – answer

  19. An example – expanding further • In general we will want to compute probabilities conditional on the observations (we will call this the data D). • This always becomes a marginal on the joint distribution with the observation probabilities fixed. e.g. • These sums can be computed much more cleverly using dynamic programming

  20. Belief Propagation Db • Interested in the marginals p(x|D) • Assume network is a tree with observations above and below D = {Da, Db} Da x

  21. Belief Propagation Dc • When we split Db = {Dc, Dd} • These are called the messages into x. • We have one message for every probability factor connected Da x Dd

  22. Belief Propagation - message passing Db Da a b

  23. Belief Propagation - message passing Db b Dc Da c a

  24. Belief Propagation • We can do the same thing repeatedly. • Start on one side, and keep getting p(x|Da) • Then start on the other ends and keep getting p(Db|x) • To get a marginal simply multiply these

  25. Belief Propagation – our example • Write probabilities as vectors with SAVE on top g1 g2 o1 o2

  26. Parameter Learning – The problem G G’ gtype g’type gfood g’food a a’ U U’ utype ufood u’type u’food o o’

  27. Parameter Learning – The problem • For every (action, goal, goal) triple there is a parameter The parameters are a probability table of P(g|g,a) • The goals are all hidden and factorized and there are many of them at Need to tie parameters gt-1 gt Must allow for factorized hidden variables

  28. Parameter Learning – Some options • Hand-craft • Roy et al, Zhang et al, Young et al, Thomson et al, Bui et al • Annotate user goal and use Maximum Likelihood • Williams et al, Kim et al, Henderson & Lemon • Isn’t always possible • Expectation Maximisation • Doshi & Roy (7 states), Syed et al (no goal changes) • Uses an unfactorisedstate • Intractable • Expectation Propagation (EP) • Allows parameter tying (details in paper) • Handles factorized hidden variables • Handles large state spaces • Doesn’t require any annotations (incl of user act) – though it does use the semantic decoder output

  29. Belief Propagation as message passing Db Da a b Message from outside the factor – q\(a) input message from above a Message from this factor to b – q*(b) Message from outside the factor – q\(b) product of input messages below b Message from this factor to a – q*(a)

  30. Belief Propagation as message passing Think in terms of approximations from each probability factor Db Da q\(a) q*(a) q*(b) q\(b) a b Message from outside network – q\(a) = p(a|Da) Message from this factor – q*(b) = p(b|Db) Message from outside network – q\(b) = p(Db|a) Message from this factor – q*(a) = p(Db|a)

  31. Belief Propagation – Unknown parameters? • Imagine we have a discrete choice for the parameters • Integrate over our estimate from the rest of the network: • To estimate q, we want to sum over a and b:

  32. Belief Propagation – Unknown parameters? • But we actually have continuous parameters • Integrate over our estimate from the rest of the network: • To estimate q, we want to sum over a and b:

  33. Expectation Propagation • This doesn’t make sense – q is a probability! • Multiplying by q\(q) gives: • Choose q*(q) to minimize KL divergence with this • If we restrict ourselves to Dirichlet distributions, we need to find the Dirichlet that best matches a mixture of Dirichlets

  34. Expectation Propagation – Example q g g’ gtype g’type a a’ u u’ o o’

  35. Expectation Propagation – Example q g g’ gtype g’type a a’ u u’ o o’

  36. Expectation Propagation – Example q g g’ gtype g’type a a’ u u’ o o’ p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

  37. Expectation Propagation – Example q g g’ gtype g’type a a’ inform(type=bar) [0.5] inform(type=hotel) [0.2] u u’ o o’ p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

  38. Expectation Propagation – Example q p(u=bar|g)0.4 * p(u=hotel|g)0.1 g g’ gtype g’type type=bar [0.45] type=hotel [0.18] a a’ inform(type=bar) [0.5] inform(type=hotel) [0.2] u u’ o o’ p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

  39. Expectation Propagation – Example q type=bar [0.44] type=hotel [0.17] p(u=bar|g)0.4 * p(u=hotel|g)0.1 g g’ gtype g’type type=bar [0.45] type=hotel [0.18] a a’ inform(type=bar) [0.5] inform(type=hotel) [0.2] u u’ o o’ p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

  40. Expectation Propagation – Example q type=bar [0.44] type=hotel [0.17] p(u=bar|g)0.4 * p(u=hotel|g)0.1 g g’ gtype g’type type=bar [0.45] type=hotel [0.18] a a’ inform(type=bar) [0.5] inform(type=hotel) [0.2] u u’ o o’ p(o|inform(type=bar)) [0.6] p(o|inform(type=rest)) [0.3] p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2]

  41. Expectation Propagation – Example q type=bar [0.44] type=hotel [0.17] p(u=bar|g)0.4 * p(u=hotel|g)0.1 g g’ gtype g’type type=bar [0.45] type=hotel [0.18] a a’ inform(type=bar) [0.5] inform(type=hotel) [0.2] u u’ o o’ p(o|inform(type=bar)) [0.5] p(o|inform(type=hotel)) [0.2] p(o|inform(type=bar)) [0.6] p(o|inform(type=rest)) [0.3]

  42. Expectation Propagation – Optimisation 1 • In dialogue systems, most of the values are equally likely • We can use this to reduce computations: • Compute the q distributions only once • Multiply instead of summing the same value repeatedly Twee stars please 1 2 3 4 5 Number of stars

  43. Expectation Propagation – Optimisation 2 • For each value, assume transition to most other values is the same (mostly constant factor) • e.g. constant probability of change The reduced number of parameters means we can speed up learning too!

  44. Results – Computation times No opt Grouping Const Change Both

  45. Results – Simulated re-ranking • Train on 1000 simulated dialogues • Re-rank simulated semantics on 1000 dialogues • Oracle accuracy is 93.5% • TAcc – Semantic accuracy of the top hypothesis • NCE – Normalized Cross Entropy Score (Confidence scores) • ICE – Item Cross Entropy Score (Accuracy + Confidence)

  46. Results – Data re-ranking • Train on Mar09 TownInfo trial data (720 dialogues) • Test on Feb08 TownInfo trial data (648 dialogues) • Oracle accuracy is 79.2%

  47. Results – Simulated dialogue management • Use reinforcement learning (Natural Actor Critic algorithm) to train two systems: • One uses hand-crafted parameters • One uses parameters learned from 1000 simulated dialogues

  48. Results – Live evaluations (control) • Tested in the Spoken Dialogue Challenge • Provide bus timetables in Pittsburgh • 800 road names (pairs represent a stop). Required to get place from, to and time • All parameters of the Cambridge system were hand-crafted

  49. Results – Live evaluations (control) CAM Estimated success rate CAM Success CAM Failure BASELINE Success BASELINE Failure BASELINE WER

  50. Summary • POMDP models are an effective model of dialogue: • For use in dialogue systems • For re-ranking semantic hypotheses off-line • Expectation Propagation allows parameter learning for complex models, without annotations of dialogue state • Experiments show: • EP gives improvements in re-ranked hypotheses • EP gives improvements in simulateddialogue management performance • Probabilistic belief gives improvements in live dialogue management performance

More Related