500 likes | 722 Views
Artificial Intelligence: Representation and Problem Solving Multi-agent Systems (1): Adversarial Search. 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126. Recap. Search/Satisfiability/Optimization/Deterministic and Symbolic Reasoning
E N D
Artificial Intelligence: Representation and Problem SolvingMulti-agent Systems (1): Adversarial Search 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126
Recap • Search/Satisfiability/Optimization/Deterministic and Symbolic Reasoning • Reason about and optimization in problems without uncertainty • Probabilistic Reasoning/Sequential Decision Making • Reason about and optimization in problems with uncertainty • Have at most one agent that is not treated as part of “environment” Fei Fang
Outline • Multi-Agent Systems • A special case: Two-Player Sequential Zero-Sum Complete-Information Games • Minimax Algorithm • Alpha-Beta Pruning • With chance nodes • Overview of Deep Blue and AlphaGo (and AlphaZero) Fei Fang
Multi-Agent Systems Robocup 2006 Mafia Game Texas Hold’em Fei Fang
Multi-Agent Systems Mobility Negotiation Societal Decision Making Security Environment Sustainability Fei Fang
A Special Case • Two-Player Sequential Zero-Sum Complete-Information Games • Two-Player • Two players involved • Sequential • Players take turns to move (take actions) • Act alternately • Zero-sum • Utility values of the two players at the end of the game have equal absolute value and opposite sign • Perfect Information • Each player can fully observe what actions other agents have taken Fei Fang
Tic-Tac-Toe • Two players, X and O, take turns marking the spaces in a 3×3 grid • The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row wins the game https://en.wikipedia.org/wiki/Tic-tac-toe Fei Fang
Chess https://en.wikipedia.org/wiki/Chess Fei Fang
Chess Garry Kasparov vs Deep Blue (1996) Result: Win-loss-draw-draw-draw-loss (In even-numbered games, Deep Blue played white) Fei Fang
Chess Fei Fang
Go • DeepMind promotion video before the game with Lee Sedol https://www.youtube.com/watch?v=SUbqykXVx0A Fei Fang
Go AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017) https://deepmind.com/blog/alphago-zero-learning-scratch/ Result: win-win-win-loss-win Result: 100-0 AlphaGo: https://www.nature.com/articles/nature16961.pdf AlphaZero: www.nature.com/articles/nature24270.pdf Fei Fang
Solution Concept • What strategies are appropriate to use in these two-player sequential zero-sum complete-information games? What action should player take? What action should player take? What action should player take? Fei Fang
Solution Concept • Iterative Definition: Each player should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards • What if someone accidentally chooses a sub-optimal action? • In the next step, the player who needs to move should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards Fei Fang
Formulation and Representation • Define a two-player sequential zero-sum complete-information games as a search-like problem (can be visualized as a simple game tree) • Initial state • : which player has the move in a state • : the set of legal moves in a state • : successor state of state after move • : if the game is over at state and otherwise • : utility for player if game ends in state Fei Fang
Formulation and Representation • A (partial) game tree for Tic-Tac-Toe Fei Fang
Formulation and Representation • In search, we find a path from initial state to a terminal state • In two-player sequential zero-sum complete-information games, we are looking for a “strategy profile” or “contingent plan”: An action for each state Fei Fang
Quiz 1 • How many terminal nodes are there in the game tree of Tic-Tac-Toe? • A: #terminal nodes= • B: #terminal nodes= • C: #terminal nodes • D: #terminal nodes Fei Fang
Minimax Algorithm • How to find the optimal action for a given possible game state? • Minimax value of a state: the value of the best outcome possible for the player who needs to move at the state assuming both players will take the best action in the corresponding states after the current move • Minimax Algorithm • Labeling the minimax value of each state given the minimax values of its successors • For player , take the max value of the values of its successors • For player , take the min value of the values of its successors • Essentially a backward induction algorithm (von Neumann & Morgenstern 1944) • Often implemented in a depth-first manner (recursive algorithm) Fei Fang
Minimax Algorithm Fei Fang
Minimax Algorithm What action should player take? Fei Fang
Minimax Algorithm • Formally • Game value = • Tie-Tac-Toe: Game value = 0 The minimax value of root node • The best play from both parties leads to a draw Fei Fang
Minimax Algorithm Fei Fang
Minimax Algorithm • How to solve such games and determine the optimal strategy profile or contingent plan, i.e., the best action to take for each possible game state? • Call MINIMAX-DECISION automatically provides the best action for each state • Intractable for large games • Chess( nodes), Go ( nodes) • Cannot even represent the optimal strategy profile in space • For many problems, you do not need a full contingent plan at the very beginning of the game • Can solve the problem in a more “online” fashion, just like how human players play Chess/Go: My opponent takes this action and what should I do now? Fei Fang
Minimax Algorithm • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 1: Depth-limited search (Limit the depth) • Minimax algorithm (Shannon, 1950): • Set (an integer) as an upper bound on the search depth • , the static evaluation function, returns an estimate of minimax value of state • Whenever we reach a nonterminal node of depth , return • If (by default), then will never be called, and the algorithm visits all the nodes in the tree and provides the optimal action for all states Fei Fang
Minimax Algorithm • Static evaluation function • Often a heuristic function that is easily computable given the representation of a state, e.g., a weighted sum of features • For chess: Count the pieces for each side, giving each a weight (queen=9, rook=5, knight/bishop=3, pawn=1) • What properties do we care about in the evaluation function? For the optimal action, only the ordering matters • In chess, can only search full-width tree to about 4 levels Fei Fang
Quiz 2 • When applying Minimax algorithm with a search depth limit of 2, we used two different static evaluation functions, as shown on the leaf nodes of left and right figure. What is the optimal action for nodes A,B,C respectively? • A: Left: a,c,e; Right: b,c,e • B: Left: a,d,f; Right: a,d,f • C: Left: a,c,e; Right: a,d,f • D: Left: b,c,e; Right: b,c,e A A a b a b B B C C f f c d e c d e Fei Fang
Pruning • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 2: Pruning subtrees (Limit the width) • Intuition: Can compute the correct decision without looking at every node (consider the bounds of the minimax value) Fei Fang
Quiz 3 • A: 3 • B: 2 • C: Need more information Fei Fang
Pruning Intuition Fei Fang
Pruning • Alpha-Beta ( pruning): compute the minimax value of a game tree (or a specific state) with minimal exploration • During the search, at state ,record the min (for MIN node) or max (for MAX node) value of its successors that have been explored • is lower bound of the minimax value for a MAX node (initialized as ) and upper bound for a MIN node (initialized as ) • During the search, at state , record the lower-bound (, initialized as ) and upper-bound (, initialized as ) of the minimax value based on what have been searched so far (not only based on its explored successors, but also the other explored branches of the tree) • As more successors of are explored, update the value of , , • Prune a subtree starting at a node if is outside of the range • For MAX player, prune if • For MIN player, prune if • and are bounds determined globally, and are bounds determined locally; If there is a conflict, it only means the local branch is useless and can be pruned Fei Fang
Pruning • :lower-bound of minimax value • : upper-bound of minimax value Fei Fang
Pruning • :lower-bound of minimax value • : upper-bound of minimax value Fei Fang
Pruning • Effectiveness depends on the ordering of successors • With perfect ordering, can search twice as deep in a given amount of time • While perfect ordering cannot be achieved, simple heuristics are very effective Fei Fang
Iterative Deepening Search • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 3: Iterative Deepening Search • Minimax Algorithm with varying depth limit • Can be integrated with pruning: use result of a small to decide the ordering when search for a larger Fei Fang
With Chance Nodes Fei Fang
Before Deep Blue Fei Fang
Before Deep Blue • Claude Shannon, Alan Turing: Minimax search with scoring function (1950) • Only show a few branches here Fei Fang
Before Deep Blue Chess Opening Book Example opening where the book goes 16 moves (32 plies) deep Fei Fang
Before Deep Blue Fei Fang
How Deep Blue Works • ~200 million moves / second = 3.6 * 1010 moves in 3 minutes • 3 min corresponds to • ~7 plies of uniform depth minimax search • 10-14 plies of uniform depth alpha-beta search • 1 sec corresponds to 380 years of human thinking time • Specialized hardware searches last 5 ply Fei Fang
How Deep Blue Works • Hardware requirement • 32-node RS6000 SP multicomputer • Each node had • 1 IBM Power2 Super Chip (P2SC) • 16 chess chips • Move generation (often takes 40-50% of time) • Evaluation • Some endgame heuristics & small endgame databases • 32GB opening & endgame database Fei Fang
(Optional) In fact, if MIN player follows some behavioral pattern that may be dependent on MAX player’s actions so far but is not dependent on the MAX player’s overall contingent plan, then MIN player can still be treated as part of the environment Connections with RL • From the player’s perspective, a “perfectly rational” player can be treated as part of the environment : given player’s action, the player chooses the optimal action, and then it is player’s turn again ( for some and ) • If the player does not know ahead of time which action the player will take, can be viewed as an RL problem • If is a node for the player, can be viewed as an estimation of the state value function in the MDP/RL Fei Fang
Connections with RL Fei Fang
Connections with RL • Learning / Estimating the state value function will help the MAX player find the optimal policy against a perfectly rational MIN player Fei Fang
How AlphaGo Works • Supervised learning + Reinforcement learning • Monte-Carlo Tree Search Fei Fang
How AlphaGo Works • Supervised learning + Reinforcement learning • Monte-Carlo Tree Search Fei Fang
How AlphaZero Works • Reinforcement learning • Monte-Carlo Tree Search Fei Fang
How AlphaZero Works • Reinforcement learning • Monte-Carlo Tree Search Fei Fang
Acknowledgment • Some slides are borrowed from previous slides made by TuomasSandholm, Ariel Procaccia, Zico Kolter and Zack Rubinstein Fei Fang