1 / 42

Game-Playing Strategies in Artificial Intelligence

Learn about minimax algorithm, alpha-beta pruning, types of games, playing strategies, and more in applied artificial intelligence.

ahibbert
Download Presentation

Game-Playing Strategies in Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Game Playing ECE457 Applied Artificial Intelligence Spring 2008 Lecture #5

  2. Outline • Types of games • Playing a perfect game • Minimax search • Alpha-beta pruning • Playing an imperfect game • Real-time • Imperfect information • Chance • Russell & Norvig, chapter 6 • Project #2 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 2

  3. Game Problems • Games are well-defined search problems… • Well-defined board configurations (states) • Limited set of well-defined moves (actions) • Well-defined victory conditions (goal) • Values assigned to pieces, moves, outcomes (cost) • …that are hard to solve by searching • A search tree for chess has an average branching factor of 35 • An average chess game lasts for 50 moves per player (ply) • The average search tree has 35100 nodes! ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 3

  4. Game Problems • The opponent • He wants to win and make our agent lose • We have no control over his actions • He prevents us from reaching the optimal solution • Introduces uncertainty in the search • We don’t know what moves the opponent will do • We will assume “perfect play” behaviour ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 4

  5. Types of Games • Zero-sum games: a player’s gains are exactly substracted from another player’s score (chess) • Non-zero-sum games: players can gain or lose without an exact change on others (prisoners’ dilemma) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 5

  6. Game-Playing Strategy • Our agent and the opponent play sequentially • We assume the opponent plays perfectly • Our agent cannot get to the optimal goal • The opponent won’t allow it • Our agent must find the best achievable goal ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 6

  7. Minimax Algorithm • Payoff (utility) function assigns a value to each leaf node in the tree • Value then propagates up to non-leaf nodes • Two players • MAX wants to maximise payoff • MIN wants to minimise payoff • MAX is the player currently looking for a move (i.e. at root of tree) • Payoff function • Simple 1 = win / 0 = draw / -1 = lose • Complex for different victory conditions • Win/lose for MAX ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 7

  8. Minimax Algorithm … … … ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 8

  9. MAX 3 1 -12 MIN MAX 3 18 5 1 15 42 56 -12 -5 Minimax Algorithm 3 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 9

  10. Minimax Algorithm • Game of Nim • Initial state: 7 matches in a pile • Each player must divide a pile into two non-empty unequal piles • Player who can’t do that, loses • Payoff • +1 win, -1 loss ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 10

  11. 3-1-1-1-1 2-1-1-1-1-1 2-2-1-1-1 3-2-1-1 2-2-2-1 5-1-1 4-2-1 3-2-2 4-1-1-1 6-1 5-2 4-3 7 3-3-1 Minimax Algorithm MAX MIN MAX MIN MAX MIN -1 -1 -1 -1 -1 +1 +1 -1 +1 -1 +1 (max wins) The value of each node is the value of the best leaf the current player (MAX or MIN) can reach. +1 -1 (max loses) +1 (max wins) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 11

  12. Minimax Algorithm • Generate entire game tree • Compute payoff of leaf nodes • For each non-leaf node, from the lowest in the tree to the root • If MAX level, then assign value of the child with maximum payoff • If MIN level, then assign value of the child with minimum payoff • At the root, select action with maximum payoff ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 12

  13. Minimax Algorithm • Complete, if tree is finite • Optimal against a perfect opponent • Time complexity = O(bm) • Space complexity = O(bm) • But remember, b and m can be huge • For chess, b ≈ 35 and m ≈ 100 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 13

  14. Alpha-Beta Pruning • MAX take the max of its children • MIN gives each child the min of its children max(min(3,18,5),min(1,15,42),min(56,-12,-5)) • We don’t need to compute the values of all the grandchildren! • Only until we find a value lower than the highest child’s value max(min(3,18,5),min(1,?,?),min(56,-12,?)) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 14

  15. Alpha-Beta Pruning • Maintain values  and  •  is the maximum value that MAX is assured of at any point in the search •  is the minimum value that MIN is assured of at any point in the search • Both computed using payoff propagated through the tree • Start with  = - and  =  • As the search goes on, the number of possible values of  and  decreases • When    • Current path is not the result of best play by both players, so no need to explore further ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 15

  16. MAX MIN MAX Alpha-Beta Pruning 1. [-, ] [, ] 4. [3, ] 3 7. [3, ] 8. [3, 56] 9. [3, -12] 5. [3, ] 2. [-, ] 1 -12 3 6. [3, 1] 3. [-, 3] 3 18 5 1 56 -12 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 16

  17. Alpha-Beta Pruning • Called as “rootvalue = Evaluate(root, -, )” Evaluate(node, , ) • If node is leaf • Return payoff • If node is MAX • v = - • For each child of node • v = max( v, Evaluate(child, , ) • Break if v   •  = max(, v) • Return v • If node is MIN • v =  • For each child of node • v = min( v, Evaluate(child, , ) ) • Break if v   •  = min(, v) • Return v ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 17

  18. Alpha-Beta Pruning • Efficiency dependant on ordering of children • Will check each of MAX’s children until finding one with a value higher than beta • Will check each of MIN’s children until finding one with a value lower than alpha • Use heuristics to order the nodes to check • Check the highest-value children first for MAX • Check the lowest-value children first for MIN • Good ordering can reduce time complexity to O(bm/2) • Random ordering gives roughly O(b3m/4) • Minimax is O(bm) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 18

  19. Minimax Exercise 5 A B C 2 5 8 9 E F G H I D 6 5 4 2 J K L M 0 8 9 1 0 17 N O ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 19

  20. Pruning Exercise 1.[-, ] A 5.[5, ] 6.[5, ] 11.[5, 8] 2.[-, ] B C 14.[5, 4] 3.[-, 6] 4.[-, 5] 7.[-, ] 12.[-, 8] E F G H I D 8.[8, ] 13.[9, 8] 6 5 4 2 J K L M 9.[8, ] 10.[8, 0] 8 9 14 0 -4 N O ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 20

  21. Imperfect Play • Real-time or time constraints • Chance • Hidden information ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 21

  22. Real-Time Games • Sometimes we can’t search the entire tree • Real-time games • Time constraints (playing against a clock) • Tree too big (e.g. chess) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 22

  23. Real-Time Games • Evaluation function • Estimate value of a non-leaf node in the tree • Cut off search at a given level • Chess: count value of pieces, available moves, board configurations, … < ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 23

  24. Real-Time Minimax Algorithm • Generate entire game tree down to maximum number of ply • Evaluate lowest nodes • For each non-leaf node, from the lowest in the tree to the root • If MAX level, then assign value of the child with maximum payoff • If MIN level, then assign value of the child with minimum payoff • At the root, select action with maximum payoff ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 24

  25. Real-Time Alpha-Beta Pruning • Called as “rootvalue = Evaluate(root, -, )” Evaluate(node, , ) • If node is at lowest level • Return evaluation • If node is MAX • v = - • For each child of node • v = max( v, Evaluate(child, , ) • Break if v   •  = max(, v) • Return v • If node is MIN • v =  • For each child of node • v = min( v, Evaluate(child, , ) ) • Break if v   •  = min(, v) • Return v ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 25

  26. Real-Time Games: Problems • Non-quiescent positions • Some state configurations cause value to change wildly • Solved with quiescence search • Expand non-quiescent boards deeper, until you reach stable “quiescent” boards ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 26

  27. Real-Time Games: Problems • Horizon effect • A “singular” move is considerably better than all others • But a damaging unavoidable move is (or can be pushed) just beyond the search depth limit (the “horizon”) • Solved with singular extension • Expand singular state deeper ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 27

  28. Games of Chance • Minimax requires planning for upcoming moves • If moves depend on dice rolls, random draws, etc., planning is impossible • We need to add all possible outcomes in the tree! ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 28

  29. 3 3 1 -12 3 18 5 1 15 42 56 -12 -5 Recall ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 29

  30. 0.05 0.05 0.05 0.8 0.8 0.8 0.15 0.15 0.15 Expectiminimax Then, MIN rolls the dice MAX has already rolled the dice and has three possible moves 4.45 There are three possible outcomes to the roll 4.15 -10.45 4.45 3 16 -7 1 25 -8 -12 -25 58 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 30 And MIN picks an action based on the roll result

  31. Expectiminimax 0.8 0.05 0.15 4.45 4.15 -10.45 4.45 0.15 0.05 0.8 0.15 0.05 0.05 0.8 0.8 0.15 1 25 -8 -12 -25 58 3 16 -7 3 7 12 16 22 -7 -3 4 17 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 31

  32. 0.05 0.05 0.05 0.8 0.8 0.8 0.15 0.15 0.15 Problems with Expectiminimax 26.65 4.15 26.65 4.45 3 16 -7 1 25 -8 -12 -25 800 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 32

  33. Problems with Expectiminimax • Time complexity: O(bmnm) • n is the number of possible outcomes of a chance node • Recall: minimax is O(bm) • Trees can grow very large very quickly • Minimax & pruning limits search to likely sequences of actions given perfect play • With randomness, there is no likely sequence of actions ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 33

  34. Imperfect Information • Algorithms so far require knowing everything about the game • In some games, information about the opponent is hidden • Cards in poker, pieces in Stratego, etc. • We could approximate hidden information to random events • The probability that the opponent has a flush, the probability that a piece is a bomb, etc. • Then use expectiminimax to get best action ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 34

  35. Imperfect Information • List all possible outcomes, then average best action overall • Can lead to irrational behaviour! • Possible cases: • Road 1 leads to money, road 2-a leads to gold, road 2-b leads to death (rational action is road 2, then a) • Road 1 leads to money, road 2-a leads to death, road 2-b leads to gold (rational action is road 2, then b) • But the real situation is: • Road 1 leads to money, road 2 leads to gold or death (rational action is road 1) 1 2 a b ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 35

  36. Imperfect Information • It’s a useful approximation, but it’s not exact! • Advantages: • Works in many cases • Doesn’t require new techniques to handle information discovery • Disadvantages: • In reality, hidden information is not the same as random events • Can lead to irrational behaviour ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 36

  37. Imperfect Information • Need to handle information • Gather information • Plan based on what information we will have at a given point in the future • Leads to more rational behaviour • Acting to gain information • Acting to give information to partners • Acting to conceal information from the opponents • We will learn to do that later in the course ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 37

  38. IBM Deep Blue • First chess computer to defeat a reigning world champion (Garry Kasparov) under normal chess tournament constraints in 1997 • Relied on brute hardware search power • 30 processors for the search • 480 custom VLSI chess processors for move generation and ordering, and leaf node evaluation ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 38

  39. IBM Deep Blue • Searched a minimax tree • 100-200M states per second, maximum 330M • Average 6 to 16 ply, maximum 40 ply • Decide which moves are worth expanding, giving priority to singular expansion and chess threats • Null-window alpha-beta pruning • Alpha-beta pruning but limited to a “window” of moves rather than the entire tree • Faster and easier to implement on hardware • Approximate, can only returns bounds on the minimax value • Allows for a highly non-uniform, more selective and human-like search of the tree ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 39

  40. IBM Deep Blue • Two board evaluation heuristics • Fast evaluation to get a quick approximate value • Considers piece position value • Slow evaluation to get an exact value • Considers 8,000 features • Includes common chess concepts and specific Kasparov strategies • Features have programmable weights learned automatically from 700,000 grandmaster games and fine-tuned manually by a chess grandmaster ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 40

  41. Assumptions • Utility-based agent • Environment • Fully observable • Deterministic • Sequential • Static • Discrete / Continuous • Single agent ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 41

  42. Assumptions Updated • Utility-based agent • Environment • Fully observable / Partially observable (approximation) • Deterministic / Strategic / Stochastic • Sequential • Static / Semi-dynamic • Discrete / Continuous • Single agent / Multi-agent ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 42

More Related