Reinforcement Learning in Games

Reinforcement Learning in Games Colin Cherry colinc@cs Oct 29/01 Reinforcement Learning in Games

Outline • Reinforcement Learning & TD Learning • TD-Gammon • TDLeaf • Chinook • Conclusion Reinforcement Learning in Games

The ideas behind Reinforcement Learning • Two broad categories for learning: • Supervised • Unsupervised (Our concern) • Problem with unsupervised learning: • Delayed rewards (temporal credit assignment) • Goal: • Create a good control policy based on delayed rewards Reinforcement Learning in Games

Evaluation Function: Developing a Control Policy • Evaluation function: • Function that estimates the total reward the agent will receive if it follows the function from this point onward • We will assume the function evaluates states (good for deterministic games) • The evaluation function could be: • Look-up table, Linear function, Neural Network, any function approximator… Reinforcement Learning in Games

Temporal Difference Learning TD(λ) • Set initial weights to 0 or random values • Assume our evaluation function evaluates a state at time t with the value Yt according to some weight vector w • Modify the equation at the end of each game as follows for each time t+1: Reinforcement Learning in Games

Objective: Dock to printer, collect a document Assume 3 states: C: next to coffee machine, no documents P: next to printer, no documents D: next to printer, carrying documents Assume 2 actions seen a: dock to printer (available only from P or D) b: go to printer (available only from C) A quick example:Printer Robot P a reward D (end) (Some time later) C b no reward P (continue) Reinforcement Learning in Games

TD-Gammon • Self-taught backgammon player • Good enough to make the best sweat • Huge success for reinforcement learning • Far surpassed its supervised learning cousin, Neurogammon Reinforcement Learning in Games

How does it work? • Used an artificial neural network for its evaluation function approximator • Excellent neural network design • Used expert features developed for Neurogammon along with basic board rep. • Hundreds of thousands of training games against itself • Hard-coded doubling algorithm Reinforcement Learning in Games

Why did it work so well? • Stochastic domain – forces exploration • Linear (basic) concepts are learned first • Shallow search is “good enough” against humans Reinforcement Learning in Games

Backgammon vrs Other gamesShallow Search • TD-Gammon followed a greedy approach • 1 ply look-ahead (later increased to 3-ply) • Its hard to predict your opponent’s move w/o his or her dice roll? What about your move after that? • Doesn’t work so well for other games: • What features will tell me what move to take by looking only at the immediate results of the moves available to me? Reinforcement Learning in Games

TDLeaf(λ) • TD Learning applied to the minimax algorithm • For each state, search to a constant depth • Evaluate a state according to a heuristic evaluation of its leaf of principle variation Reinforcement Learning in Games

Chinook • This program, at this school, in this class, should need no introduction • 84 features (4 sets of 21) were tunable by weight • Each feature consists of many hand-picked parameters • Question: Can we learn the 84 weights as well as a human can set them? Reinforcement Learning in Games

The Test • Trained using TDLeaf • All weight values set to 0 • Variations introduced by using a book of opening moves (144 3-ply openings) • Played no more than 10,000 games against itself before hitting a plateau • Both programs are to use the same depth Reinforcement Learning in Games

The results were very positive • Chinook w/ all weights set to 1 vrs Tournament Chinook: 94.5-193.5 • Chinook after self-play training vrs Tournament Chinook: Even Steven • Some Lessons Learned: • You have to train at the same depth you plan to play at • You have to play against real people too Reinforcement Learning in Games

Conclusions • TD(λ) can be a powerful tool in the creation of game-playing evaluation functions • Must be a type of training that will introduce variation • Features need to be hand-picked (for now) • TD and TDLeaf allow quick weight tuning • Takes a lot of the tedium out of player design • Allows designers more experiment with features Reinforcement Learning in Games

Reinforcement Learning in Games

Reinforcement Learning in Games

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Adaptive Reinforcement Learning Agents in RTS Games

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning on Markov Games

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning