240 likes | 408 Views
Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities. By Ashwinkumar Ganesan CMSC 601. Agenda. Reinforcement Learning Problem Statement Proposed Method Conclusions. What is Reinforcement Learning?. Method for learning by experience
E N D
Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities By AshwinkumarGanesan CMSC 601
Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions
What is Reinforcement Learning? • Method for learning by experience • There agent or bot learns by interacting with the environment. • There is reward attached for action taken in a particular state. • GOAL : MAXIMIZE THE REWARD
Bellman Equation (RL in a bit more detail) • Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision [1] • U(s) is Utility Of a state • R(s) is the reward of the state. • T(s,a,s’) is the transition from s to s’
Markov Decision Process • It is a discrete time control process • S’ is the next state • S is the current state • A is the action
Q-Learning(A Reinforcement Learning Algorithm) • Q – learning is method of maintaining reward information without learning the policy. • Q(a,s) is the current action state pair. • R(s’) is the reward for the next state.
What is Co-ordinated Group Action? • Co-ordinated Group Action is the situation a set of agents perform a single task. • GOAL: To maximize the output or the reward globally.
Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions
Some Problems In Multi-Agent systems… • Communication i.e. what should the agent communicate and how much should it communicate with other agents. • Optimal Policy i.e. defining an optimal policy of the entire group. Is an optimal policy a set of optimal individual policies for each agent? • How much of the individual policy information of a certain agent is available to the entire group.
What Am I Proposing? • Create a method for implementing reinforcement learning on co-ordinated group activity efficiently • Modify Reinforcement Learning algorithm to implement group action • Implement the proposed method and measure its efficiency in World of Warcraft.
Problem Environment • The proposal is to research co-operative group learning under the following conditions: • The environment is assumed to partially observable. • Each agent in the system is knows the final action taken by the other agent. • Agents do not have access the to state information generated by other agents while selecting an action. • Agents do not have access to the policies of other agents, to make the decision. • The rewards are not linearly separable.
World Of Warcraft • World of Warcraft is a large multi-player online game by Blizzard Entertainment • It is game where every player has his own and roams the virtual world, fighting demons, observing the landscape, buy and selling items and interacting with other players • In short, it is a large game with lots of options for tools, skills and levels, making it challenging for Bots.
World Of Warcraft http://world-of-warcraft.en.softonic.com/
Motivation • Today games especially like World Of Warcraft, have a large number of human players who play in groups. • Single as well as multiplayer games have AI engines, but attacks and actions in these games by opponents is still one at a time. • Reinforcement Learning can be implemented real time in these games, to improve the AI over a period of time and customize the games for users • Other applications – Robotics,defense
Related Work • QUICR Method: This method calculates the counterfactual action, which is the action an agent did not take at a time[2]. • Least Square Policy Iteration (LSPI) can implemented. The method performs policy iterations using samples instead of an actual policy[3]. • FMQ Algorithm. The algorithm is helpful for environments where agents have partial or no observability[4].
Inverse Reinforcement Learning • Inverse reinforcement learning is the exact opposite of reinforcement learning. • Input is the optimal policy or behavior that is expected from the agent • The agent learns to find the reward function based on observed values in the environment
Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions
Proposed Method • Implement Reinforcement Learning in 2 parts: • Implement Inverse Reinforcement Learning • Implement a modified Reinforcement Learning on the rewards learned in step1. • Observe Expert • Calculate Reward • Observe other agents • Calculate the new Q(a,s) value • Calculate Policy based on reward
Challenges • Calculating the iteration when the agent is known to have optimal policy • Finding the point when to switch from Inverse Learning to Reinforcement Learning • Working with the reward function got from observing the expert and rewards obtained from the environment • Finding a method to observe other agents and “experts”.
Evaluation Metrics • The metrics will be evaluated against the known methods • Metrics are: • Number of states generated • Number of iterations required to reach optimal policy • Rewards in terms of points earned (in the game) • Rate of Convergence to optimal Policy
Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions
Conclusion • We can use Inverse reinforcement learning with reinforcement learning methods to speed up the learning time required for bots. • Improve bot reward functions over time.
References • R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952 • Adrian K. Agogino and Kagan Turner, QUICKER Q-Learning in Multi-agent systems. • Lihong Li, Micheal L. Littman, Christopher R. Mansley, Online Exploration in Least – Squares policy Iteration. • La¨etitiaMatignon, Guillaume J. Laurent and Nadine Le Fort-Piat, A study of FMQ heuristic in cooperative multi-agent games • Acknowledgement to Prof. Tim Oates for helping with the literature survey.