Learning Agents

Learning Agents • Presented by: Huayan Gao (huayan.gao@uconn.edu), Thibaut Jahan (thj@ifrance.com), David Keil (dmkeil@att.net), Jian Lian (lianjian@yahoo.com) Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut

Outline • Agents • Distributed computing agents • The JADE platform • Reinforcement learning • UML design of agents • The maze problem • Conclusion and future work

Agents • Autonomy • goal-orientedness • collaboration • flexibility • ability to be self-starting • temporal continuity • character • adaptiveness • mobility • capacity to learn. Some general features characterizing agents:

Classification of agents • Interface AgentsAI techniques to provide assistance to the user • Mobile agentscapable of moving around networks gathering information • Co-operative agentscommunicate with, and react to, other agents in a multi-agent systems within a common environment • Reactive agents“reacts” to a stimulus or input that is governed by some state or event in its environment

Distributed Computing Agents • Common learning goal (strong sense) • Separate goals but information sharing (weak sense)

The JADE Platform • Java Agent Development Environment-Java Software framework- Middleware platform- Simplifies implementation and deployment of MAS • Services Provided- AMS (Agent Management System)registration, directory and management- DF (Directory Facilitator)yellow pages service- ACC (Agent Communication Channel)message passing service within the platform (including remote agents)

JADE Platforms for distributed agents

Agents and Markov processes Agent type Deterministic Stochastic Accessible Reflex Solves MDPs Inaccessible Policy-based Solves non-Markov POMDPs* *Partially observable Markov decision problems Environment type

Learning from the environment • Environment, especially a distributed one, may be complex, may change • Necessity to learn dynamically, without supervision • Reinforcement learning - used in adaptive systems - involves finding a policy • Q-learning, a special case of RL - compute Q-values into Q-table - finds optimal policy

Policy search • Policy: a mapping from states to actions • Policy is as opposed to action sequence • Agents that precompute action sequences cannot respond to new sensory information • Agent that follows a policy incorporates sensory information about state into action determination

Components of a learner • In learning, percepts may help improve agent’s future success in interaction • Components:- Learning element (improves policy)- Performance element (executes policy)- Critic: Applies fixed performance measure- Problem generator: Suggests experimental actions that will provide information to learning element

A learning agent and its environment

Temporal difference learning • Uses observed transitions and differences between utilities of successive states to adjust utility estimates • Update rule based on transition from state i to j:U(i) U(i) + (R(i) + U(j) U(i))where- U is estimated utility,- R is reward-  is learning rate

Q-learning • Q-learning: a variant of reinforcement learning in which the agent incrementally computes a table of expected aggregate future rewards • Agent modifies the values in the table to refine its estimates. • Using the temporal-difference learning approach, update formula is calculated after the learner goes from state i to state j:Q(a, i) Q (a, i) + (R(i) + maxaQ(a, j) -Q (a, i))

Q-values • Definition: Q-values are values Q(a, i) of expected utility associated with a given action in a given state • Utility of state:U(i) = maxaQ(a, i) • Q-values permit decision making without a transition model • Q-values are directly learnable from reward percepts

UML design of agents • Standard UML did not provide a complete solution for depicting the design of multi-agent systems. • Multi-agent systems being actors and software, their design does not follow typical UML design • Goals, complex strategies, knowledge, etc. are often missed

Reactive use cases

A maze problem • Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. • Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment

Cat and Mouse problem • Example of reinforcement learning • The rules of the Cat and Mouse game are: - Cat catches mouse;- Mouse escapes cat;- Mouse catches cheese;- Game is over when the cat catches the mouse. • Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse • Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version

Cat-Mouse GUI

Use cases in the Cat-Mouse problem

Classes for the Cat-Mouse problem

Sequence diagram

Maze creation and registration

Cat creation and registration

JADE Cat look up maze from AMS and DF service

JADE Mouse Agent Creating and Registration

Mouse Agent joins game

Game begins Game begins and Maze (master) and Mouse agents exchange information by ACL messages

Remote deployment of learning agents • Using JADE, we can deploy maze, mouse, and cat agents: Jademaze maze1 Jademouse mouse1 Jadecat cat1 • Jademaze, jademouse, jadecat are batch file names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the following commands: Jademaze –host hostname mazename; Jademaze –host hostname catname; Jademaze –host hostname mousename;

Cat-Mouse in JADE • JADE allows services to be hosted and discovered in a distributed dynamic environment. • On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services provided and join/quit from the maze server they discovered from DF service.

Innovation • A backbone for a core platform encouraging other agents to connect and join • Access to ontologies and service description to move towards interoperability at the service level • A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services • A practical test for a learning agent system complying with FIPA standards.

Deployment Scenario • Infrastructure Deployment - Enable their agents to interact with service agents developed by others - Test applications in a realistic, distributed, open environment • Agent and Service Deployment - FIPA ACL messages to exchange information- Standard FIPA ACL compatible content languages- FIPA defined agent management services| (directories, communication and naming).

Conclusions • Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents • Communication between agents • Issues with the space complexity of Q-learning:where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1) 1 mouse + 1 cat => 481Mb of memory storage for Q-Table

Future work • Learning in environments that change in response to the learning agent • Communication among learning agents; multiagent learning • Overcoming problems of table size under multiagent conditions • Security in message-passing

Partial list of references • S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001. • T. Mitchell. Machine learning. McGraw-Hill, 1997. • A. Printista, M. Errecalde, C. Montoya. A parallel implementation of Q-Learning based on communication with cache. http://journal.info.unlp.edu.ar/journal6/papers/ p4.pdf. • S. Russell, P. Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. • S. Sen, G. Weiss. Learning in multiagent systems. In G. Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999. • R. Sutton, A. Barto. Reinforcement learning: An introduction. MIT Press, 1998. • K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.Distributed intelligent agents. IEEE Expert, 12/96.

Learning Agents

Learning Agents

Presentation Transcript

Agents & Mobile Agents

Learning Agents and Disruptive Innovation

Learning Agents

Adaptive Reinforcement Learning Agents in RTS Games

Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

Object Focused Q-learning for Autonomous Agents

Fuzzy Reinforcement Learning Agents

Latent Learning in Agents

Life-long Learning in Sociable Agents

적응 학습형 에이전트 An Adaptive Learning Agents

Antianginal Agents . Antidysrythmic Agents

Intelligent Agents to Deliver Learning Materials

Subgoal Discovery and Language Learning in Reinforcement Learning Agents

Convergence Analysis of Reinforcement Learning Agents

Concept Visualization for Ontologies of Learning Agents

Object Focused Q-learning for Autonomous Agents

Agents for Learning Raters’ Orientation

Web-Mining Agents: Transfer Learning TrAdaBoost

Learning Agents Center George Mason University

Deep Reinforcement Learning: Agents To Make Decisions

Learning Agents

Learning Agents

Presentation Transcript

Agents &amp; Mobile Agents

Learning Agents and Disruptive Innovation

Learning Agents

Adaptive Reinforcement Learning Agents in RTS Games

Learning Behaviourally Grounded State Representations for Reinforcement Learning Agents

Object Focused Q-learning for Autonomous Agents

Fuzzy Reinforcement Learning Agents

Latent Learning in Agents

Life-long Learning in Sociable Agents

적응 학습형 에이전트 An Adaptive Learning Agents

Antianginal Agents . Antidysrythmic Agents

Intelligent Agents to Deliver Learning Materials

Subgoal Discovery and Language Learning in Reinforcement Learning Agents

Convergence Analysis of Reinforcement Learning Agents

Concept Visualization for Ontologies of Learning Agents

Object Focused Q-learning for Autonomous Agents

Agents for Learning Raters’ Orientation

Web-Mining Agents: Transfer Learning TrAdaBoost

Learning Agents Center George Mason University

Deep Reinforcement Learning: Agents To Make Decisions

Agents & Mobile Agents