A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems

A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School of EECS, Oregon State University

Outline • Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

Motivation • Several assistant systems proposed to • Assist users in daily tasks • Reduce their cognitive load • Examples: CALO (CALO 2003), COACH (Boger et al. 2005) etc • Problems with previous work • Fine-tuned to particular application domains • Utilize specialized technologies • Lack an overarching framework

User Assistant User Action W2 Interaction Model Goal Action set A Action set U W1 Initial State

User Assistant Assistant User Action W2 Interaction Model Goal : Minimize user’s actions W3 W4 W5 W1 Initial State Assistant Actions

User Assistant User Action W2 W6 Interaction Model Goal W3 W4 W5 W1 Initial State Assistant Actions

User Assistant User Action W2 W6 Interaction Model Goal : Minimize user’s actions Action set A W3 W4 W5 W7 W8 W1 Initial State Assistant Actions

User Assistant User Action W2 W6 W9 Goal Achieved Interaction Model Thank you W3 W4 W5 W7 W8 W1 Initial State Assistant Actions

Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

Markov Decision Process • MDP – (S,A,T,R,I) • Policy (p) – Mapping from S to A • V(p) = E(ΣTt=1 rt), T = length of episode • Optimal Policy (p*) = argmax (V(p)) p • A Partially Observable Markov Decision Process (POMDP): • O is the set of observations • µ(o|s) is a distribution over observations o є O given current state s

Decision-Theoretic Model • Assistant: History-dependent stochastic policy p‘(a|w, O) • Observables: World states, Agent’s actions • Hidden: Agent’s goals • Episode begins at state w with goal g • C(w, g, p, p’): Cost of episode • Objective: compute p’ that minimizes E[C(I, G0, p, p’)]

Assistant POMDP • Given MDP <W,A,A’,T,C,I>, G0 and p, the assistant POMDP is defined as: • State space is W x G • Action set is A’ • Transition function T’ is T’((w,g),a’,(w’,g’)) = 0 if g != g’ = T(w,a’,w’) if a’ != noop = P(T(w, p(w,g)) = w’) if a’ == noop • Cost model C’ is C’((w, g), a’) = C(w, a’) if a’ != noop = E[C(w, a)] where a is distributed according to p

Assistant POMDP G Wt+1 Wt At At+1 A’t+1 A’t St+1 St

Approximate Solution Approach • Online actions selection cycle 1) Estimate posterior goal distribution given observation 2) Action selection via myopic heuristics Goal Recognizer Action Selection P(G) Assistant Wt Ot At Environment User Ut

must learn user policy new observation Updated goal posterior Ut Wt+1 P(G| Ot+1) Goal Estimation • Given • P(G | Ot) : Goal posterior at time t • P(Ut | G, Wt) : User policy • Ot+1 : New observation of user action and world state Goal posterior given observations up to time t Wt P(G| Ot) Current State

At’ Wt Wt+2 Action Selection: Assistant POMDP • Assume we know the user goal G and policy • Can create a corresponding assistant MDP over assistant actions • Can compute Q(A, W, G) giving value of taking assistive action A when users goal is G • Select action that maximizes expected (myopic) value: Assistant MDP G At’ U Wt Wt+1 Wt+2

Folder Predictor • Previous work (Bao et al; IUI 2006): • No repredictions • Does not consider new folders • Decision-Theoretic Model • Naturally handles repredictions • Considers mixture density to obtain the distribution • Data set – set of requests of Open and saveAs • Folder hierarchy – 226 folders • Prior distribution initialized according to prior model

all foldersconsidered 1.2344 1.319 Full Assistant Framework 1.3724 1.34 restricted folder set Current Tasktracer No Reprediction With Repredictions Avg. no. of clicks per open/saveAs

Incorporating Relational Hierarchies • Tasks are hierarchical • Writing a paper • Tasks have a natural class – subclass hierarchy • Papers to ICML or IJCAI involve similar subtasks • Tasks are chosen based on some attribute of the world • Grad students work on a paper closer to the deadline • Goal: Combine these ideas to • Specify prior knowledge easily • Accelerate learning of the parameters

Doorman Domain

Gather(R) Attack(E) R.Type = S.Type E.Type = D.Type Collect(R) Deposit(R,S) KillDragon(D) DestroyCamp(E) L = S.Loc L = D.Loc L = E.Loc Kill(D) DropOff(R,S) Destroy(E) Pickup(R) Goto(L) Open(D) Move(X) L = R.Loc

Performance of different models

Open Problems • Partial Observability of the user • Currently user completely observes the environment • Not the case in real-world – User need not know what is in the refrigerator • Assistant can completely observe the world • Current system does not consider user’s exploratory actions • Setting is similar to interactive POMDPs (Doshi et al.) • Environment – POMDP • Belief states of the POMDP are belief states of the user • State space needs to be extended to capture user’s beliefs

Open Problems • Large State space • Solving POMDP is impractical • Kitchen Domain (Fern et al.) – 140000 states • Prune certain regions of the search space (Electric Elves) • Can use user trajectories as training examples • Parallel actions • Assistant and user execute actions in parallel • Useful to execute parallel subgoals - User writes paper, assistant runs experiments • Identification of the possible parallel actions • The assistant can change the goal stack of the user • Goal estimation has to include the user’s response

Open Problems • Changing goals • User can change goal midway - Work on a different project • Currently, the system would converge to the goal slowly • Explicitly model this possibility • Borrow ideas from user modelling to predict changing goals • Expanding set of goals • A large number of dishes can be cooked • Forgetting subgoals • Forgetting to attach a document to the email • Explicitly model this possibility – borrow ideas from cognitive science literature

Conclusion • Propose a general framework based on decision-theory • Experiments in a real-world domain • Repredictions are useful • Currently working on a relational hierarchical model • Outlined several open problems • Motivated the necessity of using sophisticated user models

Thank you!!!

A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems