60 likes | 190 Views
Convergence Analysis of Reinforcement Learning Agents. Srinivas Turaga 9.912 30th March, 2004. The Learning Algorithm. The Assumptions. Players use stochastic strategies. Players only observe their reward . Players attempt to estimate the value of choosing a particular action.
E N D
Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004
The Learning Algorithm The Assumptions • Players use stochastic strategies. • Players only observe their reward. • Players attempt to estimate the value of choosing a particular action. The Algorithm • Play action i with probability Pr(i) • Observe reward r • Update value function v
The Learning Algorithm Payoff matrix Player 2’s choice Player 1’s choice The Algorithm Value of action i • Play action iwith probability Pr(i) • Proportional to value of action i • Observe reward r • Depends on other player’s choice jalso • Update value function v • 2 simple schemes Algorithm 2 Algorithm 1 If action i chosen: If action i not chosen: forgetting no forgetting
Analysis Techniques • Analysis of stochastic dynamics is hard! • So approximate: • Consider average case (deterministic) • Consider continuous time (differential equation) Random! Discrete time! Deterministic! Discrete time! Deterministic! Continuous time!
Results - Matching Pennies Game • Analysis shows a fixed point corresponding to the Nash equilibrium. Linear stability analysis shows marginal stability. • Simulations of stochastic algorithm and deterministic dynamics diverge to corners. • Analysis shows a stable fixed point corresponding to matching behavior. • Simulations of stochastic algorithm and deterministic dynamics converge as expected.
Future Directions • Validate approximation technique. • Analyze properties of more general reinforcement learners. • Consider situations with asymmetric learning rates. • Study behavior of algorithms for arbitrary payoff matrices.