Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Learning of Mediation Strategies for Heterogeneous Agents Cooperation R. Charton, A. Boyer and F. Charpillet Maia Team - LORIA – France ICTAI'03 – Sacramento, CA, USA – November, 4th 2003

Context of our works Industrial collaboration for the design of adaptive services that are multimedia, interactive, and general public. • Focus on Information Retrieval assistance Constraints : • User : occasional, novice • Information Source : ownership, costs Goal : To enhance the service quality

Controllable agents N N N Partially controllable agents Non-Controllable agents C Virtual Environment Physical Environment P C C P Interaction Links C P P C Cooperation in heterogeneous multi-agent systems Agents of different nature: human, software, robots … How to make these agents cooperate ? Achieve together applicative goals that satisfy a subset of agents

Presentation OverviewLearning of Mediation Strategies for Heterogeneous Agents Cooperation • Typical example of interaction • Mediator and Mediation Strategies • Towards an MDP based Mediation • Our prototype of Mediator • Experiments and results

Goal : book a flight from Paris to Sacramento Don't know how to formulate a request Too many/raw results... Information Source Query Interaction Results Mediator An example of problem : a flight booking system Customer

Role of the mediator agent The mediator has to perform an useful task : • Build a query that matches the most the user goal • Provide relevant results to the user • Maximize an utility approximation : • User satisfaction to be maximized • Information Source cost to be minimized • At any time, the mediator can • Ask a question to the user about the query • Send the query to the information source • Propose a limited number of results to the user In return, it perceives the other agent's answers : values, results, selections, rejections …

Mediation and Mediation strategies A mediation is a sequence of question & answer interactions between the agents directed by the mediator. It is successful if the user gets the relevant results or if the mediator discovers that the source can't give any result. Now, the question is how to • produce the mediator's behavior ? • optimize the service quality ? A mediation strategy specifies which action the mediator must select to control the mediation, according to the current state of the interactions. • This requires findingan optimal mediation strategy

Got useful answers a good compromise unuseful interactions Progressive Query building Query Precision Fully specified Sufficiently specified Partially specified Number of Interactions Totally Unknown

Requirements for the Mediator Management of uncertainty and of imperfect knowledge : • agents : • users may misunderstand the questions • users may have a partial knowledge of their needs • environment : • noise during communication • imperfect sensors (for instance : speech recognition) This requires an adaptive behavior We propose to model the mediation problem with an MDP and to compute a stochastic behavior for the mediator.

0.5 0.5 • States S={s0,s1,s2} 0.4 a1 • Actions A={a0,a1} s0 a1 s1 0.9 a0 • Transition T : S  A  S  [0;1] with T(s,a,s') = P (s'|s,a) a0 0.1 0.3 0.5 0.6 0.5 0.8 a1 0.2 0.7 s2 • Optimize the expected reward a0  : Discount factor Markov Decision Process (MDP) • Stochastic model <S,A,T,R> • Take a decision according to a Policy  : S  A  [0;1] • Reward R : S  A  S  IR Compute a Mediation Strategy leads to Compute a Stochastic Policy

Modeling of the flight booking example Define the model: • S : State Space • A : Mediator's actions • T : Transitions • R : Rewards

States : How to describe goals and objects ? Form filling approach (Goddeau et al. 1996) : Queries and source objects are described within a referential. The referential is built on a set of attributes : Ref = { At 1, … , At m } • Example of referential : • Departure : { London, Geneva, Paris, Berlin, … } • Arrival: { Sacramento, Beijing, Moscow, … } • Class : { Business, Normal, Economic, ... }

sR sU • S U is the set of partial queries : • s U = {(ea1 , val1), …, (ea m , val m) } • The state of the attribute At i is a couple (eai , vali) : • Open ea = ‘?’ val is free • Closed ea = ‘F’ val cannot be specified • Assigned ea = ‘A’ val is already instantiated S R is the power set of all the objects of the information source : s R = { flight 1; ... ; flight r} is the set of objects that match the current query User Source State space S Mediator S = SUSR

 An idea : use a State Abstraction S for the MDP and only keep : the binding state {?, A, F} of each attribute from s U • the response quality qr  {?, 0, +, *} from s R Number of responses qr = ? Unknown (not yet) qr = 0 qr = + qr = * 0 nrmax State abstraction The size of the state space S is (2 n +1) (2+i) mwhere • n : total count of objects of the information source • m : number of attributes • i : average number of values per attribute  The size of the abstract state S space : 4  3m

Questions Query & propositions • Ask the user a question about an attribute • Example for the travel class • “In which class do you want to travel ?” • “Do you want to travel in business class ?” • “Are you sure you want to travel in economic class ?” + send the current query to the information source + ask the user to select a response Actions of the Mediator User Source Mediator

Selection, refusal... Results • from the user interaction part • + R selection user selects a proposition • - R noselect user refuses all the propositions • - R timeout too long interaction (user disconnection) • from the information source interaction part • + R noresp no results for a fully specified task • - R overnum too many results (response quality is '*') User Source Rewards Rewards can be obtained… Mediator

Example of mediation with the flight booking service Colors used : User Mediator Source

Compute the Mediation Strategy Problem : Two parts of model the are unknown ! • T = f (user, information source) • R = f (user, information source)  Learn the Mediation Strategy by reinforcement

Transition Reinforcement (Reward) Action Observation Reinforcement Learning Dynamic System

Q(s,a0) s'0 V(s'0) a0 Q-Value Q : S  A  IR V(s) s Q(s,a1) a1 V(s'n) s'n Updating (Bellman 57) Q-Learner s,r a Environment  : Learning rate Q-Learning (Watkins 89) • Reinforcement Learning method • Can be used online

Abstract State Rewards Selected Actions Store & retrieve preferences Updates Answers and selections Results Requests and results Requests Mediator Architecture Mediator Agent Decision Module (Q-Learning) User Profile Task Manager (real state) Interaction Manager User / Client Agent Information Agent Source

Experimentation on the flight-booking application We trained the mediator task with • 3 Attributes (cities of departure/arrival and flight class) • 4 Attributes (+ the time of day for taking off) • 5 Attributes (+ the airline) Complexity growth as function of the number of attributes.

Learning results for 3-5 attributes (% of hits) • 3 and 4 attributes 99% of selection (close to optimal) • 5 attributes 90% of selection (more time required to converge)

Learning results for 3-5 attributes (avg. length) • 3 and 4 attributes the minimal length of the mediation is reached • 5 attributes longer mediations

Conclusion Advantages • MDP+RL allows to learn mediation strategies • Answers to the needs of a majority of users (profiles) • Designer Oriented  User Oriented • Incremental Approach • Implemented Solution • Limits • User is partially observable, especially through imperfect sensors, like speech recognition • Degradation of performance for more complex tasks

Future works • Use other probabilistic models and methods: • Learn on pre-established policy • Learn the model (Sutton DynaQ, Classifiers) • POMDP approach (Modified Q-learning, Baxter Gradient) • For more generic / complex tasks • Abstraction & Scalability : Change the abstract state space for a better guidance of the process in the real state • Hierarchical decomposition (H-MPD & H-POMDP) with attribute dependencies management (ex : City  Possible Company Specific options)

Thank you for your attention Any questions ?

References (Allen et al. 2000) Allen J., Byron D., Dzikovska M., Ferguson G, Galescu L., Stent A., An Architecture for a Generic Dialogue Shell. In Natural Language Engineering, Cambridge University Press, vol 6, 2000. (Young 1999) Young S., Probabilistic Methods in Spoken Dialog Systems. In Royal Society, London, September 1999. (Levin et al. 1998) Levin E, Pieraccini R. and Eckert W. Using Markov Decision Process for Learning Dialogue Strategies. In Proceedings of ICASSP'98, Seattle, USA, 1998. (Goddeau et al. 1996) Goddeau D., Meng H., Polifroni J., Seneff S., Busayapongchaiy S., A Form-Based Dialogue Manager For Spoken Language Applications, In Proceedings of ICSLP'96, Philadelphia, 1996. (Sutton & Barto 1998) R. S. and Barto A. G. Reinforcement Learning: An Introduction. MIT Press Cambridge MA, 1998. (Watkins 1989) Watkins C., Learning from Delayed Rewards. PhD Thesis of the King's College, University of Cambridge, England, 1989. (Shardanand & Maes 1995) Shardanand U. and Maes P., Social Information Filtering: Algorithms for Automating "Word of Mouth", In Proceedings of ACM CHI'95, Vol. 1, pp. 210-217, 1995.

0 : State initialization 1-Mediator : Ask the user for Attribute 1 1-User : don't know 3- Mediator : Ask the user for Attribute 2 3- User : correct value 2-M : Send query to the source 2-Source : 25 answers ... 4- Mediator : Send query to the source 4- Source : 3 answers ... A trace in the Abstract State Space <A, ? | ?> <A, ?, 0> <A, ? | +> <A, ? | *> <A, A | ?> <A, A, 0> <A, A | +> <A, A | *> <F, ? | ?> <F, ?, 0> <F, ? | +> <F, ? | *> <F, A | ?> <F, A, 0> <F, A | +> <F, A | *> <?, ? | ?> <?, ? | 0> <?, ? | +> <?, ? | *> <?, A | ?> <?, A, 0> <?, A | +> <?, A | *> <A, F | ?> <A, F, 0> <A, F | +> <A, F | *> <?, F | ?> <?, F, 0> <?, F | +> <?, F | *> <F, F | ?> <F, F, 0> <F, F | +> <F, F | *>

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Presentation Transcript

Learning Strategies for Students: Learning Disability

STRATEGIES FOR SCIENCE LEARNING

MEDIATION STRATEGIES FOR THE ADVOCATE

Decentralized Task Allocation for Heterogeneous Teams with Cooperation Constraints

Strategies for Collocation Learning

Learning Agents

Workplace Mediation: Strategies and Techniques

Metadata Agents and Semantic Mediation

Searching Heterogeneous e-Learning Resources

Learning Agents

Towards Heterogeneous Transfer Learning

PERFORMANCE COMPARISON OF VERTICAL HANDOVER STRATEGIES FOR PSDR HETEROGENEOUS NETWORK

Subject Mediation for Integrated Access to Heterogeneous Information Sources

Heterogeneous Groups and the Limits of Cooperation

Strategies for Meaningful Learning

Concept Visualization for Ontologies of Learning Agents

Agents for Learning Raters’ Orientation

Heterogeneous Collection of Learning Systems for Confident Pattern Recognition

Large Scale Coordination of Heterogeneous Agents

PERFORMANCE COMPARISON OF VERTICAL HANDOVER STRATEGIES FOR PSDR HETEROGENEOUS NETWORK

Strategies for learning Hiragana

Strategies for Meaningful Learning