1 / 25

CS 188: Artificial Intelligence Spring 2006

Learn about game playing in practice, including examples of AI systems that have defeated human champions in games like checkers, chess, and Othello. Explore different search algorithms and strategies used in deterministic, stochastic, two-player, and multi-player games.

lschofield
Download Presentation

CS 188: Artificial Intelligence Spring 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 188: Artificial IntelligenceSpring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley

  2. Game Playing in Practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Exact solution imminent. • Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good. • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

  3. Game Playing • Axes: • Deterministic or not • Number of players • Perfect information or not • Want algorithms for calculating a strategy (policy) which recommends a move in each state

  4. Deterministic Single Player? • Deterministic, single player, perfect information: • Know the rules • Know what moves will do • Have some utility function over outcomes • E.g. Freecell, 8-Puzzle, Rubik’s cube • … it’s (basically) just search! • Slight reinterpretation: • Calculate best utility from each node • Each node is a max over children • Note that goal values are on the goal, not path sums as before 8 2 5 6

  5. Stochastic Single Player • What if we don’t know what the result of an action will be? • E.g. solitaire, minesweeper, trying to drive home • … just an MDP! • Can also do expectimax search • Chance nodes, like actions except the environment controls the action chosen • Calculate utility for each node • Max nodes as in search • Chance nodes take expectations of children 8 2 5 6

  6. Deterministic Two Player (Turns) • E.g. tic-tac-toe • Minimax search • Basically, a state-space search tree • Each layer, or ply, alternates players • Choose move to position with highest minimax value = best achievable utility against best play • Zero-sum games • One player maximizes result • The other minimizes result 8 2 5 6

  7. Minimax Example

  8. Minimax Search

  9. Minimax Properties • Optimal against a perfect player. Otherwise? • Time complexity? • O(bm) • Space complexity? • O(bm) • For chess, b  35, m  100 • Exact solution is completely infeasible • But, do we need to explore the whole tree?

  10. Multi-Player Games • Similar to minimax: • Utilities are now tuples • Each player maximizes their own entry at each node • Propagate (or back up) nodes from children 1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5

  11. Games with Chance • E.g. backgammon • Expectiminimax search! • Environment is an extra player than moves after each agent • Chance nodes take expectations, otherwise like minimax

  12. Games with Chance • Dice rolls increase b: 21 possible rolls with 2 dice • Backgammon  20 legal moves • Depth 4 = 20 x (21 x 20)3 1.2 x 109 • As depth increases, probability of reaching a given node shrinks • So value of lookahead is diminished • So limiting depth is less damaging • But pruning is less possible… • TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play

  13. Games with Hidden Information • Imperfect information: • E.g., card games, where opponent's initial cards are unknown • Typically we can calculate a probability for each possible deal • Seems just like having one big dice roll at the beginning of the game • Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals • Special case: if an action is optimal for all deals, it's optimal. • GIB, current best bridge program, approximates this idea by • 1) generating 100 deals consistent with bidding information • 2) picking the action that wins most tricks on average • Drawback to this approach? • It’s broken! • (Though useful in practice)

  14. Averaging over Deals is Broken • Road A leads to a small heap of gold pieces • Road B leads to a fork: • take the left fork and you'll find a mound of jewels; • take the right fork and you'll be run over by a bus. • Road A leads to a small heap of gold pieces • Road B leads to a fork: • take the left fork and you'll be run over by a bus; • take the right fork and you'll find a mound of jewels. • Road A leads to a small heap of gold pieces • Road B leads to a fork: • guess correctly and you'll nd a mound of jewels; • guess incorrectly and you'll be run over by a bus.

  15. Efficient Search • Several options: • Pruning: avoid regions of search tree which will never enter into (optimal) play • Limited depth: don’t search very far into the future, approximate utility with a value function (familiar?)

  16. Next Class • More game playing • Pruning • Limited depth search • Connection to reinforcement learning!

  17. - Pruning Example

  18. Q-Learning • Model free, TD learning with Q-functions:

  19. Function Approximation • Problem: too slow to learn each state’s utility one by one • Solution: what we learn about one state should generalize to similar states • Very much like supervised learning • If states are treated entirely independently, we can only learn on very small state spaces

  20. Discretization • Can put states into buckets of various sizes • E.g. can have all angles between 0 and 5 degrees share the same Q estimate • Buckets too fine  takes a long time to learn • Buckets too coarse  learn suboptimal, often jerky control • Real systems that use discretization usually require clever bucketing schemes • Adaptive sizes • Tile coding • [DEMOS]

  21. Linear Value Functions • Another option: values are linear functions of features of states (or action-state pairs) • Good if you can describe states well using a few features (e.g. for game playing board evaluations) • Now we only have to learn a few weights rather than a value for each state 0.80 0.85 0.90 0.95 0.70 0.80 0.85 0.60 0.65 0.70 0.75

  22. TD Updates for Linear Values • Can use TD learning with linear values • (Actually it’s just like the perceptron!) • Old Q-learning update: • Simply update weights of features in Q(a,s)

  23. Example: TD for Linear Qs

More Related