1 / 15

Reinforcement Learning in Games

Reinforcement Learning in Games. Colin Cherry colinc@cs Oct 29/01. Outline. Reinforcement Learning & TD Learning TD-Gammon TDLeaf Chinook Conclusion. The ideas behind Reinforcement Learning. Two broad categories for learning: Supervised Unsupervised (Our concern)

libitha
Download Presentation

Reinforcement Learning in Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning in Games Colin Cherry colinc@cs Oct 29/01 Reinforcement Learning in Games

  2. Outline • Reinforcement Learning & TD Learning • TD-Gammon • TDLeaf • Chinook • Conclusion Reinforcement Learning in Games

  3. The ideas behind Reinforcement Learning • Two broad categories for learning: • Supervised • Unsupervised (Our concern) • Problem with unsupervised learning: • Delayed rewards (temporal credit assignment) • Goal: • Create a good control policy based on delayed rewards Reinforcement Learning in Games

  4. Evaluation Function: Developing a Control Policy • Evaluation function: • Function that estimates the total reward the agent will receive if it follows the function from this point onward • We will assume the function evaluates states (good for deterministic games) • The evaluation function could be: • Look-up table, Linear function, Neural Network, any function approximator… Reinforcement Learning in Games

  5. Temporal Difference Learning TD(λ) • Set initial weights to 0 or random values • Assume our evaluation function evaluates a state at time t with the value Yt according to some weight vector w • Modify the equation at the end of each game as follows for each time t+1: Reinforcement Learning in Games

  6. Objective: Dock to printer, collect a document Assume 3 states: C: next to coffee machine, no documents P: next to printer, no documents D: next to printer, carrying documents Assume 2 actions seen a: dock to printer (available only from P or D) b: go to printer (available only from C) A quick example:Printer Robot P a reward D (end) (Some time later) C b no reward P (continue) Reinforcement Learning in Games

  7. TD-Gammon • Self-taught backgammon player • Good enough to make the best sweat • Huge success for reinforcement learning • Far surpassed its supervised learning cousin, Neurogammon Reinforcement Learning in Games

  8. How does it work? • Used an artificial neural network for its evaluation function approximator • Excellent neural network design • Used expert features developed for Neurogammon along with basic board rep. • Hundreds of thousands of training games against itself • Hard-coded doubling algorithm Reinforcement Learning in Games

  9. Why did it work so well? • Stochastic domain – forces exploration • Linear (basic) concepts are learned first • Shallow search is “good enough” against humans Reinforcement Learning in Games

  10. Backgammon vrs Other gamesShallow Search • TD-Gammon followed a greedy approach • 1 ply look-ahead (later increased to 3-ply) • Its hard to predict your opponent’s move w/o his or her dice roll? What about your move after that? • Doesn’t work so well for other games: • What features will tell me what move to take by looking only at the immediate results of the moves available to me? Reinforcement Learning in Games

  11. TDLeaf(λ) • TD Learning applied to the minimax algorithm • For each state, search to a constant depth • Evaluate a state according to a heuristic evaluation of its leaf of principle variation Reinforcement Learning in Games

  12. Chinook • This program, at this school, in this class, should need no introduction • 84 features (4 sets of 21) were tunable by weight • Each feature consists of many hand-picked parameters • Question: Can we learn the 84 weights as well as a human can set them? Reinforcement Learning in Games

  13. The Test • Trained using TDLeaf • All weight values set to 0 • Variations introduced by using a book of opening moves (144 3-ply openings) • Played no more than 10,000 games against itself before hitting a plateau • Both programs are to use the same depth Reinforcement Learning in Games

  14. The results were very positive • Chinook w/ all weights set to 1 vrs Tournament Chinook: 94.5-193.5 • Chinook after self-play training vrs Tournament Chinook: Even Steven • Some Lessons Learned: • You have to train at the same depth you plan to play at • You have to play against real people too Reinforcement Learning in Games

  15. Conclusions • TD(λ) can be a powerful tool in the creation of game-playing evaluation functions • Must be a type of training that will introduce variation • Features need to be hand-picked (for now) • TD and TDLeaf allow quick weight tuning • Takes a lot of the tedium out of player design • Allows designers more experiment with features Reinforcement Learning in Games

More Related