1 / 29

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Learning of Mediation Strategies for Heterogeneous Agents Cooperation. R. Charton, A. Boyer and F. Charpillet Maia Team - LORIA – France ICTAI'03 – Sacramento, CA, USA – November, 4 th 2003. Context of our works.

Download Presentation

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning of Mediation Strategies for Heterogeneous Agents Cooperation R. Charton, A. Boyer and F. Charpillet Maia Team - LORIA – France ICTAI'03 – Sacramento, CA, USA – November, 4th 2003

  2. Context of our works Industrial collaboration for the design of adaptive services that are multimedia, interactive, and general public. • Focus on Information Retrieval assistance Constraints : • User : occasional, novice • Information Source : ownership, costs Goal : To enhance the service quality

  3. Controllable agents N N N Partially controllable agents Non-Controllable agents C Virtual Environment Physical Environment P C C P Interaction Links C P P C Cooperation in heterogeneous multi-agent systems Agents of different nature: human, software, robots … How to make these agents cooperate ? Achieve together applicative goals that satisfy a subset of agents

  4. Presentation OverviewLearning of Mediation Strategies for Heterogeneous Agents Cooperation • Typical example of interaction • Mediator and Mediation Strategies • Towards an MDP based Mediation • Our prototype of Mediator • Experiments and results

  5. Goal : book a flight from Paris to Sacramento Don't know how to formulate a request Too many/raw results... Information Source Query Interaction Results Mediator An example of problem : a flight booking system Customer

  6. Role of the mediator agent The mediator has to perform an useful task : • Build a query that matches the most the user goal • Provide relevant results to the user • Maximize an utility approximation : • User satisfaction to be maximized • Information Source cost to be minimized • At any time, the mediator can • Ask a question to the user about the query • Send the query to the information source • Propose a limited number of results to the user In return, it perceives the other agent's answers : values, results, selections, rejections …

  7. Mediation and Mediation strategies A mediation is a sequence of question & answer interactions between the agents directed by the mediator. It is successful if the user gets the relevant results or if the mediator discovers that the source can't give any result. Now, the question is how to • produce the mediator's behavior ? • optimize the service quality ? A mediation strategy specifies which action the mediator must select to control the mediation, according to the current state of the interactions. • This requires findingan optimal mediation strategy

  8. Got useful answers a good compromise unuseful interactions Progressive Query building Query Precision Fully specified Sufficiently specified Partially specified Number of Interactions Totally Unknown

  9. Requirements for the Mediator Management of uncertainty and of imperfect knowledge : • agents : • users may misunderstand the questions • users may have a partial knowledge of their needs • environment : • noise during communication • imperfect sensors (for instance : speech recognition) This requires an adaptive behavior We propose to model the mediation problem with an MDP and to compute a stochastic behavior for the mediator.

  10. 0.5 0.5 • States S={s0,s1,s2} 0.4 a1 • Actions A={a0,a1} s0 a1 s1 0.9 a0 • Transition T : S  A  S  [0;1] with T(s,a,s') = P (s'|s,a) a0 0.1 0.3 0.5 0.6 0.5 0.8 a1 0.2 0.7 s2 • Optimize the expected reward a0  : Discount factor Markov Decision Process (MDP) • Stochastic model <S,A,T,R> • Take a decision according to a Policy  : S  A  [0;1] • Reward R : S  A  S  IR Compute a Mediation Strategy leads to Compute a Stochastic Policy

  11. Modeling of the flight booking example Define the model: • S : State Space • A : Mediator's actions • T : Transitions • R : Rewards

  12. States : How to describe goals and objects ? Form filling approach (Goddeau et al. 1996) : Queries and source objects are described within a referential. The referential is built on a set of attributes : Ref = { At 1, … , At m } • Example of referential : • Departure : { London, Geneva, Paris, Berlin, … } • Arrival: { Sacramento, Beijing, Moscow, … } • Class : { Business, Normal, Economic, ... }

  13. sR sU • S U is the set of partial queries : • s U = {(ea1 , val1), …, (ea m , val m) } • The state of the attribute At i is a couple (eai , vali) : • Open ea = ‘?’ val is free • Closed ea = ‘F’ val cannot be specified • Assigned ea = ‘A’ val is already instantiated S R is the power set of all the objects of the information source : s R = { flight 1; ... ; flight r} is the set of objects that match the current query User Source State space S Mediator S = SUSR

  14.  An idea : use a State Abstraction S for the MDP and only keep : the binding state {?, A, F} of each attribute from s U • the response quality qr  {?, 0, +, *} from s R Number of responses qr = ? Unknown (not yet) qr = 0 qr = + qr = * 0 nrmax State abstraction The size of the state space S is (2 n +1) (2+i) mwhere • n : total count of objects of the information source • m : number of attributes • i : average number of values per attribute  The size of the abstract state S space : 4  3m

  15. Questions Query & propositions • Ask the user a question about an attribute • Example for the travel class • “In which class do you want to travel ?” • “Do you want to travel in business class ?” • “Are you sure you want to travel in economic class ?” + send the current query to the information source + ask the user to select a response Actions of the Mediator User Source Mediator

  16. Selection, refusal... Results • from the user interaction part • + R selection user selects a proposition • - R noselect user refuses all the propositions • - R timeout too long interaction (user disconnection) • from the information source interaction part • + R noresp no results for a fully specified task • - R overnum too many results (response quality is '*') User Source Rewards Rewards can be obtained… Mediator

  17. Example of mediation with the flight booking service Colors used : User Mediator Source

  18. Compute the Mediation Strategy Problem : Two parts of model the are unknown ! • T = f (user, information source) • R = f (user, information source)  Learn the Mediation Strategy by reinforcement

  19. Transition Reinforcement (Reward) Action Observation Reinforcement Learning Dynamic System

  20. Q(s,a0) s'0 V(s'0) a0 Q-Value Q : S  A  IR V(s) s Q(s,a1) a1 V(s'n) s'n Updating (Bellman 57) Q-Learner s,r a Environment  : Learning rate Q-Learning (Watkins 89) • Reinforcement Learning method • Can be used online

  21. Abstract State Rewards Selected Actions Store & retrieve preferences Updates Answers and selections Results Requests and results Requests Mediator Architecture Mediator Agent Decision Module (Q-Learning) User Profile Task Manager (real state) Interaction Manager User / Client Agent Information Agent Source

  22. Experimentation on the flight-booking application We trained the mediator task with • 3 Attributes (cities of departure/arrival and flight class) • 4 Attributes (+ the time of day for taking off) • 5 Attributes (+ the airline) Complexity growth as function of the number of attributes.

  23. Learning results for 3-5 attributes (% of hits) • 3 and 4 attributes 99% of selection (close to optimal) • 5 attributes 90% of selection (more time required to converge)

  24. Learning results for 3-5 attributes (avg. length) • 3 and 4 attributes the minimal length of the mediation is reached • 5 attributes longer mediations

  25. Conclusion Advantages • MDP+RL allows to learn mediation strategies • Answers to the needs of a majority of users (profiles) • Designer Oriented  User Oriented • Incremental Approach • Implemented Solution • Limits • User is partially observable, especially through imperfect sensors, like speech recognition • Degradation of performance for more complex tasks

  26. Future works • Use other probabilistic models and methods: • Learn on pre-established policy • Learn the model (Sutton DynaQ, Classifiers) • POMDP approach (Modified Q-learning, Baxter Gradient) • For more generic / complex tasks • Abstraction & Scalability : Change the abstract state space for a better guidance of the process in the real state • Hierarchical decomposition (H-MPD & H-POMDP) with attribute dependencies management (ex : City  Possible Company Specific options)

  27. Thank you for your attention Any questions ?

  28. References (Allen et al. 2000) Allen J., Byron D., Dzikovska M., Ferguson G, Galescu L., Stent A., An Architecture for a Generic Dialogue Shell. In Natural Language Engineering, Cambridge University Press, vol 6, 2000. (Young 1999) Young S., Probabilistic Methods in Spoken Dialog Systems. In Royal Society, London, September 1999. (Levin et al. 1998) Levin E, Pieraccini R. and Eckert W. Using Markov Decision Process for Learning Dialogue Strategies. In Proceedings of ICASSP'98, Seattle, USA, 1998. (Goddeau et al. 1996) Goddeau D., Meng H., Polifroni J., Seneff S., Busayapongchaiy S., A Form-Based Dialogue Manager For Spoken Language Applications, In Proceedings of ICSLP'96, Philadelphia, 1996. (Sutton & Barto 1998) R. S. and Barto A. G. Reinforcement Learning: An Introduction. MIT Press Cambridge MA, 1998. (Watkins 1989) Watkins C., Learning from Delayed Rewards. PhD Thesis of the King's College, University of Cambridge, England, 1989. (Shardanand & Maes 1995) Shardanand U. and Maes P., Social Information Filtering: Algorithms for Automating "Word of Mouth", In Proceedings of ACM CHI'95, Vol. 1, pp. 210-217, 1995.

  29. 0 : State initialization 1-Mediator : Ask the user for Attribute 1 1-User : don't know 3- Mediator : Ask the user for Attribute 2 3- User : correct value 2-M : Send query to the source 2-Source : 25 answers ... 4- Mediator : Send query to the source 4- Source : 3 answers ... A trace in the Abstract State Space <A, ? | ?> <A, ?, 0> <A, ? | +> <A, ? | *> <A, A | ?> <A, A, 0> <A, A | +> <A, A | *> <F, ? | ?> <F, ?, 0> <F, ? | +> <F, ? | *> <F, A | ?> <F, A, 0> <F, A | +> <F, A | *> <?, ? | ?> <?, ? | 0> <?, ? | +> <?, ? | *> <?, A | ?> <?, A, 0> <?, A | +> <?, A | *> <A, F | ?> <A, F, 0> <A, F | +> <A, F | *> <?, F | ?> <?, F, 0> <?, F | +> <?, F | *> <F, F | ?> <F, F, 0> <F, F | +> <F, F | *>

More Related