360 likes | 447 Views
Outline of Talk. Introduction: what is a cognitive user interface? Example: a simple gesture-driven interface. Human decision-making and planning. Partially Observable MDPs – an intractable solution? Scaling up: statistical spoken dialogue systems. Conclusions and future work.
E N D
Outline of Talk • Introduction: what is a cognitive user interface? • Example: a simple gesture-driven interface. • Human decision-making and planning. • Partially Observable MDPs – an intractable solution? • Scaling up: statistical spoken dialogue systems. • Conclusions and future work.
What is a cognitive user interface? • Capable of reasoning and inference • Able to optimize communicative goals • Able to adapt to changing environments • Able to learn from experience An interface which supports intelligent, efficient and robust interaction between a human and a machine.
Example: A Simple Gesture-Driven User Interface A photo sorter Swipe Swipe Scroll Forward Scroll Backward Delete Photo
Interpreting the Input Forwards Forwards Backwards Backwards P(angle) Delete Delete angle Decision Boundaries
Pattern Classification P(angle) Conf(G=backwards) angle G=forwards G=delete G=backwards
Flowchart-based Decision Making Gesture? backwards Confidence? >= Threshold < Threshold Move back Do Nothing
What is missing? • No modeling of uncertainty • No tracking of belief in the user’s required goal • No quantifiable objectives hence sub-optimal decision making
Modeling Uncertainty and Inference – Bayes’ Rule Reverend Thomas Bayes (1702-1761) b(s) s Bayesian Network b’(s) ? s new belief old belief Inference via Bayes Rule action data move back
Optimizing Decisions – Bellman’s Equation Richard E Bellman (1920-1984) … Reward= + + + + s1 s2 sT sT-1 b1 b2 bT-1 bT oT-1 aT-1 oT aT o1 a1 o2 a2 Policy Reinforcement Learning
Optimizing the Photo-Sorter Swipe Swipe Scroll Forward Scroll Backward User’s Goal (states) { scroll-forward, scroll-backward, delete-photo } All other: -1 Delete Photo Rewards +1 0 +1 -20 +5 System Action { go-forward, go-back, do-delete, do-nothing } Iteratively optimize policy to maximize rewards …
Performance on the Photo-Sorting Task Reward Training Point Adapted Policy and Model 10% 20% 30% 40% 50% 0% Fixed Policy and Model Flow-charted Policy Effective Error Rate
Is Human Decision Making Bayesian? Humans have brains so that they can move. So how do humans plan movement? ….
A Simple Planning Task Prior Observation Kording and Wolpert (Nature, 427, 2004)
Models for Estimating Target Location Posterior Observation Prior Probability 1 1 1 Prior ignored Bayesian Min Error Mapping 0 1 2 Lateral shift (cm) 0 0 0 Deviation from Target -1 -1 -1 0 0 0 2 2 2 1 1 1 True lateral shift Kording and Wolpert (Nature, 427, 2004)
Bayesian Model Selection in Human Vision Inventory Train “Watch these!” Test Not visible to subjects “Which is more familiar?” Orban, Fiser, Aslin, Lengyel (Proc Nat. Academy Science, 105, 2008)
Partially Observable Markov Decision Processes • Belief represented by distributions over states andupdated from observations by Bayesian inference • Objectives defined by the accumulation of rewards • Policy which maps beliefs into actions and whichcan be optimized by reinforcement learning • Principled approach to handling uncertainty and planning • Humans appear to use similar mechanisms So what is the problem ?
Scaling-up Applying the POMDP framework in real world user interfaces is not straightforward: • The state and action sets are often very large. • Real-time belief update is intractable. • The mapping is extremely complex • Exact policy optimization is intractable.
Spoken Dialog Systems (SDS) Is that near the tower? confirm(near=tower) Recognizer Semantic Decoder Dialog Control User Dialog Acts Waveforms Words Synthesizer Message Generator Database No, it is near the castle. negate(near=castle)
Architecture of the Hidden Information State System b(s) s POMDP Summary Space Speech Understanding Belief Update User Heuristic Mapping Speech Generation Dialog Policy Two key ideas: • States are grouped into equivalence classes called partitionsand belief updating is applied to partitions rather than states • Belief space is mapped into a much simpler summary spacefor policy implementation and optimization Williams and Young (CSL 2007) Young et al (ICASSP 2007)
Initial User Request User Informed System Informed Grounded Denied Queried The HIS Belief Space Each state is composed of three factors: User Goal User Act Dialog History User goals are grouped into partitions find(venue(hotel,area=east)) find(venue(bar,area=east)) find(venue(hotel,area=west)) …. find(venue) × × HIS Belief Space Beliefs update is limited to the most likely members of this set. Young et al (CSL 2009)
Policy Heuristic Mapping act type confirm( ) confirm(area=east) Master <-> Summary State Mapping Master space is mapped into a reduced summary space: P(top) P(Nxt) T12Same TPStatus THStatus TUserAct LastSA find(venue(hotel,area=east,near=Museum)) find(venue(bar,area=east,near=Museum)) find(venue(hotel,area=east) find(venue(hotel,area=west) find(venue(hotel) ....etc VQ Greet Bold Request Tentative Request Confirm Offer Inform .... etc
Learning with a simulated User Learning by interaction with real users is expensive/impractical. A solution is to use a simulated user, trained on real data. Q-Learning User Simulator includes ASR error model Dialog Corpus Random action Summary Space Belief Update Heuristic Mapping Dialog Policy Schatzmann et al (Knowledge Eng Review 2006)
HIS Performance in Noise Success Rate (%) Simulated User 95 90 HIS 85 80 75 MDP 70 Hand- crafted (HDC) 65 60 55 0 5 10 15 20 25 30 35 40 45 Error Rate (%)
Representing beliefs Beliefs in a spoken dialog system entail a large number of so-called slot variables. Eg for tourist information: P(venue, location, pricerange, foodtype, music, …) Cardinality is huge and we cannot handle the full joint distribution. In the HIS system, we threshold the joint distribution and just record the high probability values. The partitions marginalize out all the unknowns. P(venue=bar, location=central, music=jazz) = 0.32 P(venue=bar, location=central, music=blues) = 0.27 P(venue=bar, location=east, music=jazz) = 0.11 etc But this is approximate, and belief update now depends on the assumption that the underlying user goal does not change. An alternative is to model beliefs directly using dynamic Bayesian nets …
Modeling Belief with Dynamic Bayesian Networks (DBNs) Decompose state into DBN, retaining only essential conditional dependencies a a’ Eg restaurant gtype g’type Eg chinese g gfood g’food utype ufood u’type u’food u u u’ htype h’type h hfood h’food Time t Time t+1 o o’ Thomson et al (ICASSP, 2008)
Factor Graph for the Full Tourist Information System • Factor graphs are very large, even with minimal dependency modeling. • Hence • need very efficient belief updating • need to define policies directly on full belief networks
Bayesian Update of Dialog State (BUDS) System Belief update depends on message passing x1 f x …. xM sum over all combinations of variable values Grouping possible values into partitions greatly simplifies these summations P(food) …. Fr It food Z2 Z1 Z3 Thomson et al (CSL 2009)
Belief Propagation Times Time Standard LBP LBP with Grouping LBP with Grouping & Const Prob of Change Network Branching Factor
Policy Optimization in the BUDS System Summary space now depends on forming a simple characterization ofeach individual slot. Define policy as a parametric function and optimize wrt θusing Natural Actor Criticalgorithm. Each action dependent basis function is separated out into slot-based components, eg 1.0 0.0 0.0 0.8 0.2 0.0 0.6 0.4 0.0 0.4 0.4 0.2 0.3 0.3 0.4 action indicator function slot belief quantization 0 1 0 0 0 1st 2nd Rest Thomson et al (ICASSP, 2008)
BUDS Performance in Noise Average Reward Simulated User BUDS MDP Error Rate (%)
Conclusions • Future generations of intelligent systems and agents will needrobust, adaptive, cognitive human-computer interfaces • Bayesian belief tracking and automatic strategy optimizationprovide the mathematical foundations • Human evolution seems to have come to the same conclusion • Early results are promising but research is needed • to develop scalable solutions which can handle very largenetworks in real time • to incorporate more detailed linguistic capabilities • to understand how to integrate different modalities: speech, gesture, emotion, etc • to understand how to migrate these approaches into industrialsystems.
Credits EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation Spoken Dialogue Management using Partially Observable Markov Decision Processes Past and Present Members of the CUED Dialogue Systems Group Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre, Francois Mairesse, Jost Schatzmann, Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams, Hui Ye, Kai Yu Colleagues in the CUED Information Engineering Division Bill Byrne, Mark Gales, Zoubin Ghahramani, Mate Lengyel, Daniel Wolpert, Phil Woodland