1 / 32

A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems

A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems. Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School of EECS, Oregon State University. Outline. Introduction Decision-Theoretic Model Experiment with folder predictor

jalen
Download Presentation

A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Decision-Theoretic Model of Assistance - Evaluation, Extensions and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School of EECS, Oregon State University

  2. Outline • Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  3. Motivation • Several assistant systems proposed to • Assist users in daily tasks • Reduce their cognitive load • Examples: CALO (CALO 2003), COACH (Boger et al. 2005) etc • Problems with previous work • Fine-tuned to particular application domains • Utilize specialized technologies • Lack an overarching framework

  4. User Assistant User Action W2 Interaction Model Goal Action set A Action set U W1 Initial State

  5. User Assistant Assistant User Action W2 Interaction Model Goal : Minimize user’s actions W3 W4 W5 W1 Initial State Assistant Actions

  6. User Assistant User Action W2 W6 Interaction Model Goal W3 W4 W5 W1 Initial State Assistant Actions

  7. User Assistant User Action W2 W6 Interaction Model Goal : Minimize user’s actions Action set A W3 W4 W5 W7 W8 W1 Initial State Assistant Actions

  8. User Assistant User Action W2 W6 W9 Goal Achieved Interaction Model Thank you W3 W4 W5 W7 W8 W1 Initial State Assistant Actions

  9. Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  10. Markov Decision Process • MDP – (S,A,T,R,I) • Policy (p) – Mapping from S to A • V(p) = E(ΣTt=1 rt), T = length of episode • Optimal Policy (p*) = argmax (V(p)) p • A Partially Observable Markov Decision Process (POMDP): • O is the set of observations • µ(o|s) is a distribution over observations o є O given current state s

  11. Decision-Theoretic Model • Assistant: History-dependent stochastic policy p‘(a|w, O) • Observables: World states, Agent’s actions • Hidden: Agent’s goals • Episode begins at state w with goal g • C(w, g, p, p’): Cost of episode • Objective: compute p’ that minimizes E[C(I, G0, p, p’)]

  12. Assistant POMDP • Given MDP <W,A,A’,T,C,I>, G0 and p, the assistant POMDP is defined as: • State space is W x G • Action set is A’ • Transition function T’ is T’((w,g),a’,(w’,g’)) = 0 if g != g’ = T(w,a’,w’) if a’ != noop = P(T(w, p(w,g)) = w’) if a’ == noop • Cost model C’ is C’((w, g), a’) = C(w, a’) if a’ != noop = E[C(w, a)] where a is distributed according to p

  13. Assistant POMDP G Wt+1 Wt At At+1 A’t+1 A’t St+1 St

  14. Approximate Solution Approach • Online actions selection cycle 1) Estimate posterior goal distribution given observation 2) Action selection via myopic heuristics Goal Recognizer Action Selection P(G) Assistant Wt Ot At Environment User Ut

  15. must learn user policy new observation Updated goal posterior Ut Wt+1 P(G| Ot+1) Goal Estimation • Given • P(G | Ot) : Goal posterior at time t • P(Ut | G, Wt) : User policy • Ot+1 : New observation of user action and world state Goal posterior given observations up to time t Wt P(G| Ot) Current State

  16. At’ Wt Wt+2 Action Selection: Assistant POMDP • Assume we know the user goal G and policy • Can create a corresponding assistant MDP over assistant actions • Can compute Q(A, W, G) giving value of taking assistive action A when users goal is G • Select action that maximizes expected (myopic) value: Assistant MDP G At’ U Wt Wt+1 Wt+2

  17. Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  18. Folder Predictor • Previous work (Bao et al; IUI 2006): • No repredictions • Does not consider new folders • Decision-Theoretic Model • Naturally handles repredictions • Considers mixture density to obtain the distribution • Data set – set of requests of Open and saveAs • Folder hierarchy – 226 folders • Prior distribution initialized according to prior model

  19. all foldersconsidered 1.2344 1.319 Full Assistant Framework 1.3724 1.34 restricted folder set Current Tasktracer No Reprediction With Repredictions Avg. no. of clicks per open/saveAs

  20. Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  21. Incorporating Relational Hierarchies • Tasks are hierarchical • Writing a paper • Tasks have a natural class – subclass hierarchy • Papers to ICML or IJCAI involve similar subtasks • Tasks are chosen based on some attribute of the world • Grad students work on a paper closer to the deadline • Goal: Combine these ideas to • Specify prior knowledge easily • Accelerate learning of the parameters

  22. Doorman Domain

  23. Gather(R) Attack(E) R.Type = S.Type E.Type = D.Type Collect(R) Deposit(R,S) KillDragon(D) DestroyCamp(E) L = S.Loc L = D.Loc L = E.Loc Kill(D) DropOff(R,S) Destroy(E) Pickup(R) Goto(L) Open(D) Move(X) L = R.Loc

  24. Performance of different models

  25. Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  26. Open Problems • Partial Observability of the user • Currently user completely observes the environment • Not the case in real-world – User need not know what is in the refrigerator • Assistant can completely observe the world • Current system does not consider user’s exploratory actions • Setting is similar to interactive POMDPs (Doshi et al.) • Environment – POMDP • Belief states of the POMDP are belief states of the user • State space needs to be extended to capture user’s beliefs

  27. Open Problems • Large State space • Solving POMDP is impractical • Kitchen Domain (Fern et al.) – 140000 states • Prune certain regions of the search space (Electric Elves) • Can use user trajectories as training examples • Parallel actions • Assistant and user execute actions in parallel • Useful to execute parallel subgoals - User writes paper, assistant runs experiments • Identification of the possible parallel actions • The assistant can change the goal stack of the user • Goal estimation has to include the user’s response

  28. Open Problems • Changing goals • User can change goal midway - Work on a different project • Currently, the system would converge to the goal slowly • Explicitly model this possibility • Borrow ideas from user modelling to predict changing goals • Expanding set of goals • A large number of dishes can be cooked • Forgetting subgoals • Forgetting to attach a document to the email • Explicitly model this possibility – borrow ideas from cognitive science literature

  29. Introduction • Decision-Theoretic Model • Experiment with folder predictor • Incorporating Relational Hierarchies • Open Problems • Conclusion

  30. Conclusion • Propose a general framework based on decision-theory • Experiments in a real-world domain • Repredictions are useful • Currently working on a relational hierarchical model • Outlined several open problems • Motivated the necessity of using sophisticated user models

  31. Thank you!!!

More Related