370 likes | 518 Views
Learning Agents. Presented by: Huayan Gao ( huayan.gao@uconn.edu ), Thibaut Jahan ( thj@ifrance.com ), David Keil ( dmkeil@att.net ), Jian Lian ( lianjian@yahoo.com ). Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr.
E N D
Learning Agents • Presented by: Huayan Gao (huayan.gao@uconn.edu), Thibaut Jahan (thj@ifrance.com), David Keil (dmkeil@att.net), Jian Lian (lianjian@yahoo.com) Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut
Outline • Agents • Distributed computing agents • The JADE platform • Reinforcement learning • UML design of agents • The maze problem • Conclusion and future work
Agents • Autonomy • goal-orientedness • collaboration • flexibility • ability to be self-starting • temporal continuity • character • adaptiveness • mobility • capacity to learn. Some general features characterizing agents:
Classification of agents • Interface AgentsAI techniques to provide assistance to the user • Mobile agentscapable of moving around networks gathering information • Co-operative agentscommunicate with, and react to, other agents in a multi-agent systems within a common environment • Reactive agents“reacts” to a stimulus or input that is governed by some state or event in its environment
Distributed Computing Agents • Common learning goal (strong sense) • Separate goals but information sharing (weak sense)
The JADE Platform • Java Agent Development Environment-Java Software framework- Middleware platform- Simplifies implementation and deployment of MAS • Services Provided- AMS (Agent Management System)registration, directory and management- DF (Directory Facilitator)yellow pages service- ACC (Agent Communication Channel)message passing service within the platform (including remote agents)
Agents and Markov processes Agent type Deterministic Stochastic Accessible Reflex Solves MDPs Inaccessible Policy-based Solves non-Markov POMDPs* *Partially observable Markov decision problems Environment type
Learning from the environment • Environment, especially a distributed one, may be complex, may change • Necessity to learn dynamically, without supervision • Reinforcement learning - used in adaptive systems - involves finding a policy • Q-learning, a special case of RL - compute Q-values into Q-table - finds optimal policy
Policy search • Policy: a mapping from states to actions • Policy is as opposed to action sequence • Agents that precompute action sequences cannot respond to new sensory information • Agent that follows a policy incorporates sensory information about state into action determination
Components of a learner • In learning, percepts may help improve agent’s future success in interaction • Components:- Learning element (improves policy)- Performance element (executes policy)- Critic: Applies fixed performance measure- Problem generator: Suggests experimental actions that will provide information to learning element
Temporal difference learning • Uses observed transitions and differences between utilities of successive states to adjust utility estimates • Update rule based on transition from state i to j:U(i) U(i) + (R(i) + U(j) U(i))where- U is estimated utility,- R is reward- is learning rate
Q-learning • Q-learning: a variant of reinforcement learning in which the agent incrementally computes a table of expected aggregate future rewards • Agent modifies the values in the table to refine its estimates. • Using the temporal-difference learning approach, update formula is calculated after the learner goes from state i to state j:Q(a, i) Q (a, i) + (R(i) + maxaQ(a, j) -Q (a, i))
Q-values • Definition: Q-values are values Q(a, i) of expected utility associated with a given action in a given state • Utility of state:U(i) = maxaQ(a, i) • Q-values permit decision making without a transition model • Q-values are directly learnable from reward percepts
UML design of agents • Standard UML did not provide a complete solution for depicting the design of multi-agent systems. • Multi-agent systems being actors and software, their design does not follow typical UML design • Goals, complex strategies, knowledge, etc. are often missed
A maze problem • Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. • Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment
Cat and Mouse problem • Example of reinforcement learning • The rules of the Cat and Mouse game are: - Cat catches mouse;- Mouse escapes cat;- Mouse catches cheese;- Game is over when the cat catches the mouse. • Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse • Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version
JADE Cat look up maze from AMS and DF service
JADE Mouse Agent Creating and Registration
Game begins Game begins and Maze (master) and Mouse agents exchange information by ACL messages
Remote deployment of learning agents • Using JADE, we can deploy maze, mouse, and cat agents: Jademaze maze1 Jademouse mouse1 Jadecat cat1 • Jademaze, jademouse, jadecat are batch file names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the following commands: Jademaze –host hostname mazename; Jademaze –host hostname catname; Jademaze –host hostname mousename;
Cat-Mouse in JADE • JADE allows services to be hosted and discovered in a distributed dynamic environment. • On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services provided and join/quit from the maze server they discovered from DF service.
Innovation • A backbone for a core platform encouraging other agents to connect and join • Access to ontologies and service description to move towards interoperability at the service level • A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services • A practical test for a learning agent system complying with FIPA standards.
Deployment Scenario • Infrastructure Deployment - Enable their agents to interact with service agents developed by others - Test applications in a realistic, distributed, open environment • Agent and Service Deployment - FIPA ACL messages to exchange information- Standard FIPA ACL compatible content languages- FIPA defined agent management services| (directories, communication and naming).
Conclusions • Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents • Communication between agents • Issues with the space complexity of Q-learning:where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1) 1 mouse + 1 cat => 481Mb of memory storage for Q-Table
Future work • Learning in environments that change in response to the learning agent • Communication among learning agents; multiagent learning • Overcoming problems of table size under multiagent conditions • Security in message-passing
Partial list of references • S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001. • T. Mitchell. Machine learning. McGraw-Hill, 1997. • A. Printista, M. Errecalde, C. Montoya. A parallel implementation of Q-Learning based on communication with cache. http://journal.info.unlp.edu.ar/journal6/papers/ p4.pdf. • S. Russell, P. Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. • S. Sen, G. Weiss. Learning in multiagent systems. In G. Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999. • R. Sutton, A. Barto. Reinforcement learning: An introduction. MIT Press, 1998. • K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.Distributed intelligent agents. IEEE Expert, 12/96.