220 likes | 536 Views
Reinforcement Learning in Games. Colin Cherry colinc@cs Oct 29/01. Outline. Reinforcement Learning & TD Learning TD-Gammon TDLeaf Chinook Conclusion. The ideas behind Reinforcement Learning. Two broad categories for learning: Supervised Unsupervised (Our concern)
E N D
Reinforcement Learning in Games Colin Cherry colinc@cs Oct 29/01 Reinforcement Learning in Games
Outline • Reinforcement Learning & TD Learning • TD-Gammon • TDLeaf • Chinook • Conclusion Reinforcement Learning in Games
The ideas behind Reinforcement Learning • Two broad categories for learning: • Supervised • Unsupervised (Our concern) • Problem with unsupervised learning: • Delayed rewards (temporal credit assignment) • Goal: • Create a good control policy based on delayed rewards Reinforcement Learning in Games
Evaluation Function: Developing a Control Policy • Evaluation function: • Function that estimates the total reward the agent will receive if it follows the function from this point onward • We will assume the function evaluates states (good for deterministic games) • The evaluation function could be: • Look-up table, Linear function, Neural Network, any function approximator… Reinforcement Learning in Games
Temporal Difference Learning TD(λ) • Set initial weights to 0 or random values • Assume our evaluation function evaluates a state at time t with the value Yt according to some weight vector w • Modify the equation at the end of each game as follows for each time t+1: Reinforcement Learning in Games
Objective: Dock to printer, collect a document Assume 3 states: C: next to coffee machine, no documents P: next to printer, no documents D: next to printer, carrying documents Assume 2 actions seen a: dock to printer (available only from P or D) b: go to printer (available only from C) A quick example:Printer Robot P a reward D (end) (Some time later) C b no reward P (continue) Reinforcement Learning in Games
TD-Gammon • Self-taught backgammon player • Good enough to make the best sweat • Huge success for reinforcement learning • Far surpassed its supervised learning cousin, Neurogammon Reinforcement Learning in Games
How does it work? • Used an artificial neural network for its evaluation function approximator • Excellent neural network design • Used expert features developed for Neurogammon along with basic board rep. • Hundreds of thousands of training games against itself • Hard-coded doubling algorithm Reinforcement Learning in Games
Why did it work so well? • Stochastic domain – forces exploration • Linear (basic) concepts are learned first • Shallow search is “good enough” against humans Reinforcement Learning in Games
Backgammon vrs Other gamesShallow Search • TD-Gammon followed a greedy approach • 1 ply look-ahead (later increased to 3-ply) • Its hard to predict your opponent’s move w/o his or her dice roll? What about your move after that? • Doesn’t work so well for other games: • What features will tell me what move to take by looking only at the immediate results of the moves available to me? Reinforcement Learning in Games
TDLeaf(λ) • TD Learning applied to the minimax algorithm • For each state, search to a constant depth • Evaluate a state according to a heuristic evaluation of its leaf of principle variation Reinforcement Learning in Games
Chinook • This program, at this school, in this class, should need no introduction • 84 features (4 sets of 21) were tunable by weight • Each feature consists of many hand-picked parameters • Question: Can we learn the 84 weights as well as a human can set them? Reinforcement Learning in Games
The Test • Trained using TDLeaf • All weight values set to 0 • Variations introduced by using a book of opening moves (144 3-ply openings) • Played no more than 10,000 games against itself before hitting a plateau • Both programs are to use the same depth Reinforcement Learning in Games
The results were very positive • Chinook w/ all weights set to 1 vrs Tournament Chinook: 94.5-193.5 • Chinook after self-play training vrs Tournament Chinook: Even Steven • Some Lessons Learned: • You have to train at the same depth you plan to play at • You have to play against real people too Reinforcement Learning in Games
Conclusions • TD(λ) can be a powerful tool in the creation of game-playing evaluation functions • Must be a type of training that will introduce variation • Features need to be hand-picked (for now) • TD and TDLeaf allow quick weight tuning • Takes a lot of the tedium out of player design • Allows designers more experiment with features Reinforcement Learning in Games