120 likes | 165 Views
Reinforcement learning (RL) involves an agent learning through trial and error to maximize rewards in a dynamic environment. This text explores the characteristics of RL, the tradeoff between exploration and exploitation, and the components of an RL agent. It also discusses how RL can be applied to adaptive clustering. References to seminal works in the field are included.
E N D
Introduction to Reinforcement Learning Shijiang Lu
What Is Reinforcement Learning • Reinforcement learning (RL) is the problem facing an agent that must learn how to interact on a trial and error basis with a dynamic environment so that to maximize a scalar reward.
Agent s r a Fs(sT) Fr(sT) sT Environment Agent And Its Environment a: the agent’s action sT: the true state of the environment s: state of the environment perceived by the agent r: immediate reward perceived by the agent Fs(sT) an Fr(sT): functions that map sT to s and r
Characteristics of RL • The agent has a goal (or goals) to achieve • The agent can take actions and the agent’s action will affect its environment • The agent learns in a trial and error fashion, i.e., the agent has no teacher and must learn by itself
Characteristics of RL (Cont.) • The agent’s action should be chosen based on its perception of its environment and its evaluation of how well its need has been fulfilled already. • The agent may or may not have knowledge about its environment initially. Nevertheless, it must interact with its environment.
Characteristics of RL (Cont.) • The agent may not know everything about the environment, i.e., there can be hidden states that the agent has no knowledge about. • The environment may change independent of the agent’s action
Characteristics of RL (Cont.) • The environment may be non-deterministic, i.e., when the agent takes the same action under the same state, the environment may response differently. • The reward of an action may come instantaneously, or it may be delayed, i.e., not immediately after the agent’s action.
Tradeoff Between Exploration and Exploitation • Exploration: Finding new knowledge by trying new actions, etc. • Exploitation: Using learned knowledge to find the best action. • Tradeoff: Neither exploration nor exploitation alone will yield satisfactory results
Four Components of A RL Agent • Policy . At each time step, a policy takes s and r as input and outputs an action a • Reward function R(s, a). Reward function takes s and a as input and returns a scalar value (the expected immediate reward) for taking action a at state s
Four Components of A RL Agent (Cont.) • Value function V. The expected total return from s given that the agent uses policy • Model. The model predicts the behavior of the environment, i.e., for given s and a, what the immediate reward will be and how the states will change
RL for Adaptive Clustering • Actions: changing clustering algorithms, parameters, attributes/features, etc. • Immediate reward: how good the clustering result is • By using a trial and error approach, we can learn what is the best clustering algorithm, what attributes/features to choose, etc.
References • [Sutton98] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press. http://citeseer.nj.nec.com/sutton98reinforcement.html • [Kaelbling96] Leslie P. Kaelbling, Michael L. Littman, and Andrew W. Moore Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4:237285, 1996 http://citeseer.ist.psu.edu/kaelbling96reinforcement.html