Grounding in Conversational Systems

Grounding in Conversational Systems Dan Bohus January 2003 Dialogs on Dialogs Reading Group Carnegie Mellon University

Overview • Early grounding theories • Discourse Contributions - Clark & Schaefer • Conversational acts – Traum • A Computational Framework (Horvitz, Paek) • Principles • Systems • Grounding in RavenClaw

Clark & Schaefer • In discourse, humans collaborate to establish/maintain mutual ground • Discourse is structured in contributions • Contribution : Presentation + Acceptance • Grounding criterion: “A and B mutually believe that the partners have understood what A said to a criterion sufficient for the current purposes”

Clark & Schaefer (2) • Evidence of understanding: • Display • Demonstration • Acknowledgement • Initiating the next relevant contribution • Continued attention • Display/Demonstration order challenged…

Clark & Schaefer (3) • Infinite recursion avoided by Strength of Evidence Principle • 4 possible states of non-understading • L did not notice S’s utterance • L notices it but does not hear it correctly • L hears it correctly but does not understand it • L understands it

Traum • Conversational acts, extension of speech acts theory • Turn-taking • Grounding • Initiate, Continue, Cancel, ReqAck, Ack, ReqRepair, Repair • Core speech acts • Argumentational acts • Eliminates infinite recursion by: ack.s don’t need further ack.s

Traum (2) • Later work, the following computational model is introduced: • Finally, Brennan (& Clark) • another computational formulation; • studies the different types of grounding behaviors in different media

Criticisms • These models are by-and-large descriptive. • Can’t be used to determine what’s the next best thing to do to achieve the grounding criterion. • Moreover, they don’t describe quantitatively/make use of the uncertainty in contributions • Are insensitive to differences in channels, content, populations, etc… • Cannot be used for guidance • Decision Theory to the rescue ! ! !

Decision Theory • Action under uncertainty • Given a set of states S = {s}, evidence e, and a set of actions A = {a}, if: • P(s|e) – is a probabilistic model of the state conditioned on the evidence • U(a,s) = the utility of taking action a when in state s. • Take action that maximizes the expected utility: • EU(a|e) = S U(a,s)*p(s|e)

Conversation under Uncertainty • Conversation = action under uncertainty • Example: I want to fly to Pittsburgh … • States = {grounded, not_grounded} • Unaccessible, but describable by a probabilistic model • P(g | e) = P(Pittsburgh | e) … confidence annot. • Actions = {explicit_confirm, implicit_confirm, continue_dialog} • Utilities: • U(ec,g) < U(ic,g) < U(cd,g) • U(ec,ng) > U(ic,ng) > U(cd,ng)

ec ic cd t2 t1 I want to fly to Pittsburgh (2) • States: • NotGrounded (ng) • Grounded (g) • Actions: • ExplicitConfirm (ec) • ImplicitConfirm (ic) • ContinueDialog (cd) • Utilities: • U(ec,g) < U(ic,g) < U(cd,g) • U(ec,ng) > U(ic,ng) > U(cd,ng) g ng

Overview • Early grounding theories • Discourse Contributions - Clark & Schaefer • Conversational acts – Traum • A Computational Framework (Horvitz, Paek) • Principles • Systems • DeepListener • Bayesian Receptionist (Quartet architecture) • Presenter (Quartet architecture) • Grounding in RavenClaw

DeepListener - Domain • Domain • Provides spoken command-and-control functionality for LookOut • Respond to offers of assistance (Yes/No) • Small domain, but illustrates the core ideas very well

DeepListener - States • States: 5 possible “intentions” of the user • Acknowledgement • Negation • Reflection • Unrecognized Signal • No Signal • State model P(S|E) – temporal bayesian network. • E = User’s Actions, Content, ASR Results and Reliability + at time -1

DeepListener - Actions • Actions: • Execute the service • Repeat • Note a hesitation and try again • Was that meant for me? • Try to get the user’s attention • Apologize for the interruption and forego the service • Troubleshoot the overall dialog

DeepListener - Utilities • Utilities • Elicited through psychological experiments • Elicited through slidebars • Works when you have 2, 3 grounding actions, and a clear/small state-space design, but how about when the problem gets more complex ? • Example (paper)

Bayesian Receptionist, Presenter • Bayesian Receptionist – performs the tasks of a receptionist at a MS front desk • “I’m here to see Rashid” • “Bathroom?” • “Beam me to 25 please” • … 32 goals • Presenter – command & control interface to PowerPoint presentations. • Both based on Quartet architecture

Quartet • Uses DT and BN to ensure grounding at 4 different levels: • Signal • Channel • Intention • Conversation • The actual DM task is encapsulated in the same framework at the Intention level • Different domains = different intention levels

Quartet – Signal & Channel • At each level infer a distribution over possible states. Key variables: • Signal level – signal identified (low/med/hi) • Channel level –user’s focus of attention • Maintenance module integrates Signal & Channel levels -> Maintenance Status: • Channel x Signal: NoChannel, NoSignal, ChannelButNoSignal, SignalButNoChannel, Signal

Quartet – Intention Level • Domain is mostly goal inference • Hierarchical decomposition on levels, where lower levels refine the goals into more specific needs • Use BN to model p(goal | e) at leach level • Psychological studies to identify key variables and utilities • Visual cues • Linguistic variables; both syntactic and semantic

Quartet – Intention Level • To move between levels, compare probability of goal to… • p-progress • (above: do it) • p-guess • (above: search confirmation) • (below: search more info via VOI) • p-backtrack • used on return nodes • Use Value-Of-Information analysis to infer what’s the variable that should be queried next.

Comments on Intention level • What is the size of the learning problem? (How many BN needed?) How much data needed for training? • Not very clear : • how to deal with attribute/value, with rich ranges (e.g. which bus station ?) • how to deal with basically richer dialog mechanisms (beyond C&C applications) • focus shifts, mixed initiative • providing help

Quartet – Conversation Level • See image. Use Intention and Maintenance Status to infer: • Grounding: diagnoses mutual understanding • Okay, ChannelFailure, IntentionFailure, ConversationFailure • Activity goal: measures if the user is engaged or not in an activity with the system • Compute expected utility for each action (utilities elicited through psychological studies)

Bayesian Receptionist, Presenter • Runtime behavior (section 3) • Presenter • The Signal & Channel level allow a uniform treatment in the same framework of continuous listening • Experiments show that it’s better than random, but significantly less so than humans • But then again, the experiments were not very fair, being performed only at that level (i.e. no engaging in dialog allowed)

My Research … • Deal with misunderstandings • Use probabilistic modeling and decision theory to make grounding decisions (but not task decisions) • I want a room tomorrow morning (0.73) • States: time correctly understood/not • Grounding Actions: no_action, expl_conf, impl_conf, reject • Utilities: try to learn them by relating the actions to an overall dialog/grounding metric

RoomLine Login RoomLine Bye GetQuery ExecuteQuery DiscussResults Dialog Task Grounding Level Grounding Model Optimal action State/howwell are things going Strategies/Grounding Actions RavenClaw: Dialog Task / Grounding

States and Actions • Actions Strategies.xls • States (have to keep it small!!!) • Single “state-space” model • What are the variables? Which are observable and which are stochastically modeled? • Multiple “state-space” models • First 5 strategies: state = amount of grounding on each concept • What should state be for the rest? What are the indicators? Which are fully observable and which are not? • How to combine decisions from different spaces

Utilities • Learn them! How ? • Idea 1: POMDPs, maybe this small they are tractable • Idea 2: Regression to some overall dialog metric • What should that be? • (hmm) amount of non-null grounding actions taken • … • …

Grounding in Conversational Systems