1 / 50

15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu

Artificial Intelligence: Representation and Problem Solving Multi-agent Systems (1): Adversarial Search. 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126. Recap. Search/Satisfiability/Optimization/Deterministic and Symbolic Reasoning

kreeli
Download Presentation

15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence: Representation and Problem SolvingMulti-agent Systems (1): Adversarial Search 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126

  2. Recap • Search/Satisfiability/Optimization/Deterministic and Symbolic Reasoning • Reason about and optimization in problems without uncertainty • Probabilistic Reasoning/Sequential Decision Making • Reason about and optimization in problems with uncertainty • Have at most one agent that is not treated as part of “environment” Fei Fang

  3. Outline • Multi-Agent Systems • A special case: Two-Player Sequential Zero-Sum Complete-Information Games • Minimax Algorithm • Alpha-Beta Pruning • With chance nodes • Overview of Deep Blue and AlphaGo (and AlphaZero) Fei Fang

  4. Multi-Agent Systems Robocup 2006 Mafia Game Texas Hold’em Fei Fang

  5. Multi-Agent Systems Mobility Negotiation Societal Decision Making Security Environment Sustainability Fei Fang

  6. A Special Case • Two-Player Sequential Zero-Sum Complete-Information Games • Two-Player • Two players involved • Sequential • Players take turns to move (take actions) • Act alternately • Zero-sum • Utility values of the two players at the end of the game have equal absolute value and opposite sign • Perfect Information • Each player can fully observe what actions other agents have taken Fei Fang

  7. Tic-Tac-Toe • Two players, X and O, take turns marking the spaces in a 3×3 grid • The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row wins the game https://en.wikipedia.org/wiki/Tic-tac-toe Fei Fang

  8. Chess https://en.wikipedia.org/wiki/Chess Fei Fang

  9. Chess Garry Kasparov vs Deep Blue (1996) Result: Win-loss-draw-draw-draw-loss (In even-numbered games, Deep Blue played white) Fei Fang

  10. Chess Fei Fang

  11. Go • DeepMind promotion video before the game with Lee Sedol https://www.youtube.com/watch?v=SUbqykXVx0A Fei Fang

  12. Go AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017) https://deepmind.com/blog/alphago-zero-learning-scratch/ Result: win-win-win-loss-win Result: 100-0 AlphaGo: https://www.nature.com/articles/nature16961.pdf AlphaZero: www.nature.com/articles/nature24270.pdf Fei Fang

  13. Solution Concept • What strategies are appropriate to use in these two-player sequential zero-sum complete-information games? What action should player take? What action should player take? What action should player take? Fei Fang

  14. Solution Concept • Iterative Definition: Each player should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards • What if someone accidentally chooses a sub-optimal action? • In the next step, the player who needs to move should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards Fei Fang

  15. Formulation and Representation • Define a two-player sequential zero-sum complete-information games as a search-like problem (can be visualized as a simple game tree) • Initial state • : which player has the move in a state • : the set of legal moves in a state • : successor state of state after move • : if the game is over at state and otherwise • : utility for player if game ends in state Fei Fang

  16. Formulation and Representation • A (partial) game tree for Tic-Tac-Toe Fei Fang

  17. Formulation and Representation • In search, we find a path from initial state to a terminal state • In two-player sequential zero-sum complete-information games, we are looking for a “strategy profile” or “contingent plan”: An action for each state Fei Fang

  18. Quiz 1 • How many terminal nodes are there in the game tree of Tic-Tac-Toe? • A: #terminal nodes= • B: #terminal nodes= • C: #terminal nodes • D: #terminal nodes Fei Fang

  19. Minimax Algorithm • How to find the optimal action for a given possible game state? • Minimax value of a state: the value of the best outcome possible for the player who needs to move at the state assuming both players will take the best action in the corresponding states after the current move • Minimax Algorithm • Labeling the minimax value of each state given the minimax values of its successors • For player , take the max value of the values of its successors • For player , take the min value of the values of its successors • Essentially a backward induction algorithm (von Neumann & Morgenstern 1944) • Often implemented in a depth-first manner (recursive algorithm) Fei Fang

  20. Minimax Algorithm Fei Fang

  21. Minimax Algorithm What action should player take? Fei Fang

  22. Minimax Algorithm • Formally • Game value = • Tie-Tac-Toe: Game value = 0 The minimax value of root node • The best play from both parties leads to a draw Fei Fang

  23. Minimax Algorithm Fei Fang

  24. Minimax Algorithm • How to solve such games and determine the optimal strategy profile or contingent plan, i.e., the best action to take for each possible game state? • Call MINIMAX-DECISION automatically provides the best action for each state • Intractable for large games • Chess( nodes), Go ( nodes) • Cannot even represent the optimal strategy profile in space • For many problems, you do not need a full contingent plan at the very beginning of the game • Can solve the problem in a more “online” fashion, just like how human players play Chess/Go: My opponent takes this action and what should I do now? Fei Fang

  25. Minimax Algorithm • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 1: Depth-limited search (Limit the depth) • Minimax algorithm (Shannon, 1950): • Set (an integer) as an upper bound on the search depth • , the static evaluation function, returns an estimate of minimax value of state • Whenever we reach a nonterminal node of depth , return • If (by default), then will never be called, and the algorithm visits all the nodes in the tree and provides the optimal action for all states Fei Fang

  26. Minimax Algorithm • Static evaluation function • Often a heuristic function that is easily computable given the representation of a state, e.g., a weighted sum of features • For chess: Count the pieces for each side, giving each a weight (queen=9, rook=5, knight/bishop=3, pawn=1) • What properties do we care about in the evaluation function? For the optimal action, only the ordering matters • In chess, can only search full-width tree to about 4 levels Fei Fang

  27. Quiz 2 • When applying Minimax algorithm with a search depth limit of 2, we used two different static evaluation functions, as shown on the leaf nodes of left and right figure. What is the optimal action for nodes A,B,C respectively? • A: Left: a,c,e; Right: b,c,e • B: Left: a,d,f; Right: a,d,f • C: Left: a,c,e; Right: a,d,f • D: Left: b,c,e; Right: b,c,e A A a b a b B B C C f f c d e c d e Fei Fang

  28. Pruning • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 2: Pruning subtrees (Limit the width) • Intuition: Can compute the correct decision without looking at every node (consider the bounds of the minimax value) Fei Fang

  29. Quiz 3 • A: 3 • B: 2 • C: Need more information Fei Fang

  30. Pruning Intuition Fei Fang

  31. Pruning • Alpha-Beta ( pruning): compute the minimax value of a game tree (or a specific state) with minimal exploration • During the search, at state ,record the min (for MIN node) or max (for MAX node) value of its successors that have been explored • is lower bound of the minimax value for a MAX node (initialized as ) and upper bound for a MIN node (initialized as ) • During the search, at state , record the lower-bound (, initialized as ) and upper-bound (, initialized as ) of the minimax value based on what have been searched so far (not only based on its explored successors, but also the other explored branches of the tree) • As more successors of are explored, update the value of , , • Prune a subtree starting at a node if is outside of the range • For MAX player, prune if • For MIN player, prune if • and are bounds determined globally, and are bounds determined locally; If there is a conflict, it only means the local branch is useless and can be pruned Fei Fang

  32. Pruning • :lower-bound of minimax value • : upper-bound of minimax value Fei Fang

  33. Pruning • :lower-bound of minimax value • : upper-bound of minimax value Fei Fang

  34. Pruning • Effectiveness depends on the ordering of successors • With perfect ordering, can search twice as deep in a given amount of time • While perfect ordering cannot be achieved, simple heuristics are very effective Fei Fang

  35. Iterative Deepening Search • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 3: Iterative Deepening Search • Minimax Algorithm with varying depth limit • Can be integrated with pruning: use result of a small to decide the ordering when search for a larger Fei Fang

  36. With Chance Nodes Fei Fang

  37. Before Deep Blue Fei Fang

  38. Before Deep Blue • Claude Shannon, Alan Turing: Minimax search with scoring function (1950) • Only show a few branches here Fei Fang

  39. Before Deep Blue Chess Opening Book Example opening where the book goes 16 moves (32 plies) deep Fei Fang

  40. Before Deep Blue Fei Fang

  41. How Deep Blue Works • ~200 million moves / second = 3.6 * 1010 moves in 3 minutes • 3 min corresponds to • ~7 plies of uniform depth minimax search • 10-14 plies of uniform depth alpha-beta search • 1 sec corresponds to 380 years of human thinking time • Specialized hardware searches last 5 ply Fei Fang

  42. How Deep Blue Works • Hardware requirement • 32-node RS6000 SP multicomputer • Each node had • 1 IBM Power2 Super Chip (P2SC) • 16 chess chips • Move generation (often takes 40-50% of time) • Evaluation • Some endgame heuristics & small endgame databases • 32GB opening & endgame database Fei Fang

  43. (Optional) In fact, if MIN player follows some behavioral pattern that may be dependent on MAX player’s actions so far but is not dependent on the MAX player’s overall contingent plan, then MIN player can still be treated as part of the environment Connections with RL • From the player’s perspective, a “perfectly rational” player can be treated as part of the environment : given player’s action, the player chooses the optimal action, and then it is player’s turn again ( for some and ) • If the player does not know ahead of time which action the player will take, can be viewed as an RL problem • If is a node for the player, can be viewed as an estimation of the state value function in the MDP/RL Fei Fang

  44. Connections with RL Fei Fang

  45. Connections with RL • Learning / Estimating the state value function will help the MAX player find the optimal policy against a perfectly rational MIN player Fei Fang

  46. How AlphaGo Works • Supervised learning + Reinforcement learning • Monte-Carlo Tree Search Fei Fang

  47. How AlphaGo Works • Supervised learning + Reinforcement learning • Monte-Carlo Tree Search Fei Fang

  48. How AlphaZero Works • Reinforcement learning • Monte-Carlo Tree Search Fei Fang

  49. How AlphaZero Works • Reinforcement learning • Monte-Carlo Tree Search Fei Fang

  50. Acknowledgment • Some slides are borrowed from previous slides made by TuomasSandholm, Ariel Procaccia, Zico Kolter and Zack Rubinstein Fei Fang

More Related