15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu

Artificial Intelligence: Representation and Problem SolvingMulti-agent Systems (1): Adversarial Search 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126

Recap • Search/Satisfiability/Optimization/Deterministic and Symbolic Reasoning • Reason about and optimization in problems without uncertainty • Probabilistic Reasoning/Sequential Decision Making • Reason about and optimization in problems with uncertainty • Have at most one agent that is not treated as part of “environment” Fei Fang

Outline • Multi-Agent Systems • A special case: Two-Player Sequential Zero-Sum Complete-Information Games • Minimax Algorithm • Alpha-Beta Pruning • With chance nodes • Overview of Deep Blue and AlphaGo (and AlphaZero) Fei Fang

Multi-Agent Systems Robocup 2006 Mafia Game Texas Hold’em Fei Fang

Multi-Agent Systems Mobility Negotiation Societal Decision Making Security Environment Sustainability Fei Fang

A Special Case • Two-Player Sequential Zero-Sum Complete-Information Games • Two-Player • Two players involved • Sequential • Players take turns to move (take actions) • Act alternately • Zero-sum • Utility values of the two players at the end of the game have equal absolute value and opposite sign • Perfect Information • Each player can fully observe what actions other agents have taken Fei Fang

Tic-Tac-Toe • Two players, X and O, take turns marking the spaces in a 3×3 grid • The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row wins the game https://en.wikipedia.org/wiki/Tic-tac-toe Fei Fang

Chess https://en.wikipedia.org/wiki/Chess Fei Fang

Chess Garry Kasparov vs Deep Blue (1996) Result: Win-loss-draw-draw-draw-loss (In even-numbered games, Deep Blue played white) Fei Fang

Chess Fei Fang

Go • DeepMind promotion video before the game with Lee Sedol https://www.youtube.com/watch?v=SUbqykXVx0A Fei Fang

Go AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017) https://deepmind.com/blog/alphago-zero-learning-scratch/ Result: win-win-win-loss-win Result: 100-0 AlphaGo: https://www.nature.com/articles/nature16961.pdf AlphaZero: www.nature.com/articles/nature24270.pdf Fei Fang

Solution Concept • What strategies are appropriate to use in these two-player sequential zero-sum complete-information games? What action should player take? What action should player take? What action should player take? Fei Fang

Solution Concept • Iterative Definition: Each player should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards • What if someone accidentally chooses a sub-optimal action? • In the next step, the player who needs to move should choose the best action leading to the highest utility for it assuming both players will choose the best actions afterwards Fei Fang

Formulation and Representation • Define a two-player sequential zero-sum complete-information games as a search-like problem (can be visualized as a simple game tree) • Initial state • : which player has the move in a state • : the set of legal moves in a state • : successor state of state after move • : if the game is over at state and otherwise • : utility for player if game ends in state Fei Fang

Formulation and Representation • A (partial) game tree for Tic-Tac-Toe Fei Fang

Formulation and Representation • In search, we find a path from initial state to a terminal state • In two-player sequential zero-sum complete-information games, we are looking for a “strategy profile” or “contingent plan”: An action for each state Fei Fang

Quiz 1 • How many terminal nodes are there in the game tree of Tic-Tac-Toe? • A: #terminal nodes= • B: #terminal nodes= • C: #terminal nodes • D: #terminal nodes Fei Fang

Minimax Algorithm • How to find the optimal action for a given possible game state? • Minimax value of a state: the value of the best outcome possible for the player who needs to move at the state assuming both players will take the best action in the corresponding states after the current move • Minimax Algorithm • Labeling the minimax value of each state given the minimax values of its successors • For player , take the max value of the values of its successors • For player , take the min value of the values of its successors • Essentially a backward induction algorithm (von Neumann & Morgenstern 1944) • Often implemented in a depth-first manner (recursive algorithm) Fei Fang

Minimax Algorithm Fei Fang

Minimax Algorithm What action should player take? Fei Fang

Minimax Algorithm • Formally • Game value = • Tie-Tac-Toe: Game value = 0 The minimax value of root node • The best play from both parties leads to a draw Fei Fang

Minimax Algorithm Fei Fang

Minimax Algorithm • How to solve such games and determine the optimal strategy profile or contingent plan, i.e., the best action to take for each possible game state? • Call MINIMAX-DECISION automatically provides the best action for each state • Intractable for large games • Chess( nodes), Go ( nodes) • Cannot even represent the optimal strategy profile in space • For many problems, you do not need a full contingent plan at the very beginning of the game • Can solve the problem in a more “online” fashion, just like how human players play Chess/Go: My opponent takes this action and what should I do now? Fei Fang

Minimax Algorithm • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 1: Depth-limited search (Limit the depth) • Minimax algorithm (Shannon, 1950): • Set (an integer) as an upper bound on the search depth • , the static evaluation function, returns an estimate of minimax value of state • Whenever we reach a nonterminal node of depth , return • If (by default), then will never be called, and the algorithm visits all the nodes in the tree and provides the optimal action for all states Fei Fang

Minimax Algorithm • Static evaluation function • Often a heuristic function that is easily computable given the representation of a state, e.g., a weighted sum of features • For chess: Count the pieces for each side, giving each a weight (queen=9, rook=5, knight/bishop=3, pawn=1) • What properties do we care about in the evaluation function? For the optimal action, only the ordering matters • In chess, can only search full-width tree to about 4 levels Fei Fang

Quiz 2 • When applying Minimax algorithm with a search depth limit of 2, we used two different static evaluation functions, as shown on the leaf nodes of left and right figure. What is the optimal action for nodes A,B,C respectively? • A: Left: a,c,e; Right: b,c,e • B: Left: a,d,f; Right: a,d,f • C: Left: a,c,e; Right: a,d,f • D: Left: b,c,e; Right: b,c,e A A a b a b B B C C f f c d e c d e Fei Fang

Pruning • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 2: Pruning subtrees (Limit the width) • Intuition: Can compute the correct decision without looking at every node (consider the bounds of the minimax value) Fei Fang

Quiz 3 • A: 3 • B: 2 • C: Need more information Fei Fang

Pruning Intuition Fei Fang

Pruning • Alpha-Beta ( pruning): compute the minimax value of a game tree (or a specific state) with minimal exploration • During the search, at state ,record the min (for MIN node) or max (for MAX node) value of its successors that have been explored • is lower bound of the minimax value for a MAX node (initialized as ) and upper bound for a MIN node (initialized as ) • During the search, at state , record the lower-bound (, initialized as ) and upper-bound (, initialized as ) of the minimax value based on what have been searched so far (not only based on its explored successors, but also the other explored branches of the tree) • As more successors of are explored, update the value of , , • Prune a subtree starting at a node if is outside of the range • For MAX player, prune if • For MIN player, prune if • and are bounds determined globally, and are bounds determined locally; If there is a conflict, it only means the local branch is useless and can be pruned Fei Fang

Pruning • :lower-bound of minimax value • : upper-bound of minimax value Fei Fang

Pruning • Effectiveness depends on the ordering of successors • With perfect ordering, can search twice as deep in a given amount of time • While perfect ordering cannot be achieved, simple heuristics are very effective Fei Fang

Iterative Deepening Search • If we only care about the game value, or the optimal action at a specific game state, can we do better? • Trick 3: Iterative Deepening Search • Minimax Algorithm with varying depth limit • Can be integrated with pruning: use result of a small to decide the ordering when search for a larger Fei Fang

With Chance Nodes Fei Fang

Before Deep Blue Fei Fang

Before Deep Blue • Claude Shannon, Alan Turing: Minimax search with scoring function (1950) • Only show a few branches here Fei Fang

Before Deep Blue Chess Opening Book Example opening where the book goes 16 moves (32 plies) deep Fei Fang

Before Deep Blue Fei Fang

How Deep Blue Works • ~200 million moves / second = 3.6 * 1010 moves in 3 minutes • 3 min corresponds to • ~7 plies of uniform depth minimax search • 10-14 plies of uniform depth alpha-beta search • 1 sec corresponds to 380 years of human thinking time • Specialized hardware searches last 5 ply Fei Fang

How Deep Blue Works • Hardware requirement • 32-node RS6000 SP multicomputer • Each node had • 1 IBM Power2 Super Chip (P2SC) • 16 chess chips • Move generation (often takes 40-50% of time) • Evaluation • Some endgame heuristics & small endgame databases • 32GB opening & endgame database Fei Fang

(Optional) In fact, if MIN player follows some behavioral pattern that may be dependent on MAX player’s actions so far but is not dependent on the MAX player’s overall contingent plan, then MIN player can still be treated as part of the environment Connections with RL • From the player’s perspective, a “perfectly rational” player can be treated as part of the environment : given player’s action, the player chooses the optimal action, and then it is player’s turn again ( for some and ) • If the player does not know ahead of time which action the player will take, can be viewed as an RL problem • If is a node for the player, can be viewed as an estimation of the state value function in the MDP/RL Fei Fang

Connections with RL Fei Fang

Connections with RL • Learning / Estimating the state value function will help the MAX player find the optimal policy against a perfectly rational MIN player Fei Fang

How AlphaGo Works • Supervised learning + Reinforcement learning • Monte-Carlo Tree Search Fei Fang

How AlphaZero Works • Reinforcement learning • Monte-Carlo Tree Search Fei Fang

Acknowledgment • Some slides are borrowed from previous slides made by TuomasSandholm, Ariel Procaccia, Zico Kolter and Zack Rubinstein Fei Fang

15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu