410 likes | 518 Views
Lecture about Agents that Learn. 3rd April 2000 INT4/2I1235. Agenda. Introduction Centralized learning vs decentralized learning Credit Assignment Problem Learning and Activity Coordination Learning about and from other agents Learning and Communication Summary. Introduction.
E N D
Lecture about Agents that Learn • 3rd April 2000 • INT4/2I1235
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Introduction • Todays topic • Who is the lecturer • Why do we have this lecture
Todays topic • How do agents learn? • What are the benefits of learning agents? • Learning in isolation, or in cooperation?
Who is the lecturer • Johan Kummeneje • Doctoral Student • RoboCup, Social Decisions, and Java
Why do we have this lecture • Beats me….. You tell me. • Take 2 minutes to think about why this is interesting, and then I will ask 2 or 3 of you what you think.
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Centralized vs Decentralized • Introduction • The Degree of Decentralization • Interaction-specific features • Involvement-specific features • Goal-specific features • The learning method • The learning feedback
Introduction • Learning process => planning, inference, decision steps etc. • Centralized learning or isolated learning • Decentralized learning or interactive learning
The Degree of Decentralization • Distributedness • Paralellism
Interaction-specific features • Level of interaction ( ”simple” observation to complex negotiations and dialogues) • Persitence of interaction (short-long) • Frequency (low -high) • Pattern ( unstructured- hierarchical) • Variability (fixed - dynamic)
Involvement-specific features • Relevance to the learning process • Role in the learning process • Generalist-- Specialist
Goal-specific features • Improvement (Individual vs Social) • Conflict vs Compatible Goals
The learning method • Rote learning (”Korvstoppning”) • Instructed and adviced • Examples and practice (Learning by Doing, Baden-Powell) • Analogy • Discovery Efforts increase from top to bottom.
The learning feedback • Supervised (tells which action that is the best) • Reinforcement (maximizing the utility of action) • Unsupervised (no explicit feedback)
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Credit Assignment Problem • Inter Agent CAP (how to divide credit to the different agents) • Intra Agent CAP (how to divide credit between different actions performed in an agent)
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Learning and Activity Coordination • Introduction • Reinforcement Learning • Q-Learning and Learning Classifier Systems • Isolated, Concurrent Reinforcement Learners • Interactive Reinforcement Learning of Coordination • ACE and AGE
Introduction • Activity Coordination • Adaption to to differences in the coordination process • Effectively utilize opportunities and avoidance of pitfalls.
Reinforcement Learning • Optimise the feedback (reinforcement) • Modeled by a Markov decision process • <S, A, SxSxA,r>
Q-Learning • When getting feedback=> update the Q-value • Q(s,a) <- (1-b)Q(s,a)+b(R+y max(Q(s',a')) • where b is a small constant called the learning rate
Learning Classifier Systems • A classifier is (condition, action) • Strength of the classifier at a time S(c,a) • At each timestep a classifier is choosen from a matchset ( according to environment) • Feedback is received and the S is modified accordingly.
Isolated, Concurrent Reinforcement Learners • Agent Coupling • Agent relationships • Feedback timing • Optimal behaviour combinations • CIRL • No modelling of other agents • In cooperative situations, complimentary policies can be developed • Adapts to similar situations.
Interactive Reinforcement Learning of Coordination • Eliminates incompatible actions • Agents can observe the set of considered actions of other agents. • Two different alternatives are ACE and AGE
Action Estimate Algorithm (ACE) • Each agent calculates the set of performable actions • For each of these the agent calculates the goalrelevance. • For all agent with a GR above a treshold, the agents calc. And announces a bid with a risk factor and a noise term : • B(S)= (a+b)E(S) • Removal of incompatible actions. It thereafter executes the one with the highest bid. • The feedback increases the probability for succesful actions to be performed in future.
Action Group Estimate Algorithm (AGE) • All applicable actions from each agent is collected in to all possible activity contexts, in which all actions are mutually compatible. • Using the same bidding strategy from ACE, the highest sum of bids for a activity context, chooses the activity context to execute. • Credit assignment is dependent on the actions performed and the relevance of the action. • Requires more computational effort than ACE.
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Learning about and from other agents • Introduction • Learning Organizational Roles • Learning in Market Environments
Introduction • Learning to improve the individual performance • On the expense of other agents • Anticipatory Agents, RMM
Learning Organizational Roles • Learns roles, to better complement each other. • Each agent can be in a set of roles (one at a time), and the choice is to choose the most appropriate role. (Minimise costs). • f(U, P, C, Potential)
Learning in Market Environments • Agents sell/buy information from each other. • 0-level agents do not model other agents • 1-level agents model other agents as 0-level agents • 2-level agents model other agents as 1-level agents
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Learning and Communication • Introduction • Reducing Communication by Learning • Improving Learning by Communication
Introduction • Learning to communicate • Communicating as learning • What to communicate? • When to communicate? • With whom to communicate? • How to communicate?
Reducing Communication by Learning • Learning about the abilities of other agents. • Learning which agents to ask, instead of broadcasting • Problem similarities
Improving Learning by Communication • Communicating beliefs and pieces of information • Explanation • Ontologies • Finding out complex relationships between different agents and actions.
Agenda • Introduction • Centralized learning vs decentralized learning • Credit Assignment Problem • Learning and Activity Coordination • Learning about and from other agents • Learning and Communication • Summary
Summary • We have seen the move of foci from isolated (individual, centralized) learning to a more diverse flora of learning. • Besides standard (old) ML-methods there are some new ML-algorithms proposed. • Agents learn to improve communication and cooperation.
Further reading • Peter Stone, Ph.D-thesis • Weiss (coursematerial), chapter 6 • Russell and Norvig, AI. A modern Approach