Chapter 6 : Game Search 게임 탐색

Chapter 6 : Game Search 게임 탐색

게임 탐색의 특성 - Exhaustive search is almost impossible. ==> mostly too many branching factor and too many depths. (e.g., 바둑: (18 * 18)! ), 체스:DeepBlue ? - 정적평가점수(Static evaluation score) ==> board quality - maximizing player ==> hoping to win (me) minimizing player ==> hoping to lose (enemy) - Game tree ==> is a semantic tree with node (board configuration) and branch (moves). original board state new board state new board state

2 7 1 8 2 7 1 8 2 7 1 8 Minimax Game Search Idea: take maximum score at maximizing level (my turn). take minimum score at minimizing level (your turn). maximizing level 2 1 minimizing level maximizing level 상대는? 나는? 2 “this move gurantees best”

최소최대 탐색 예 • 평가함수 값을 최대화 시키려는 최대화자 A의 탐색 • 최소화자 B의 의도를 고려한 A의 선택 A [c1] f=0.8 [c2] f=0.3 [c3] f=-0.2 A [c1] f=0.8 [c2] f=0.3 [c3] f=-0.2 최소화자(B) 단계 [c11] f=0.9 [c12] f=0.1 [c13] f=-0.6 [c31] f=-0.1 [c32] f=-0.3 [c21] f=0.1 [c22] f=-0.7

Minimax Algorithm Function MINIMAX( state ) returns an action inputs: state, current state in game v = MAX-VALUE (state) return the action corresponding with value v Function MAX-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) thenreturnUTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s ) ) return v Function MIN-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) thenreturn UTILITY( state ) v =  for a, s in SUCCESSORS( state ) do v = MIN( v, MAX-VALUE( s ) ) return v

Minimax Example MAX node MIN node 12 8 2 4 6 14 5 2 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Minimax Example MAX node MIN node 3 2 2 12 8 2 4 6 14 5 2 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Minimax Example 3 MAX node MIN node 3 2 2 12 8 2 4 6 14 5 2 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Tic-Tac-Toe • Tic-tac-toe, also called noughts and crosses (in the British Commonwealth countries) and X's and O's in the Republic of Ireland, is a pencil-and-paper game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The X player usually goes first. The player who succeeds in placing three respective marks in a horizontal, vertical, or diagonal row wins the game. • The following example game is won by the first player, X:

Save time

a-b Pruning Idea: 탐색 공간을 줄인다! (mini-max  지수적으로 폭발) a-b principle: “if you have an idea that is surely bad, do not take time to see how truly bad it is.” =2 >=2 Max >=2 a-cut =2 =2 Min <=1 2 7 2 7 2 7 1

알파베타 가지치기 • 최대화 노드에서 가능한 최소의 값(알파 )과 최소화의 노드에서 가능한 최대의 값(베타 )를 사용한 게임 탐색법 • 기본적으로 DFS로 탐색 진행 [c0] =0.2 [c1] f=0.2 [c2] f= -0.1 a-cut [c21] f= -0.1 [c22] [c23] [c11] f=0.2 [c12] f=0.7 C21의 평가값 -0.1이 C2에 올려지면 나머지 노드들(C22, C23)을 더 이상 탐색할 필요가 없음

a-b Procedure a never decrease (initially - infinite) b never increase (initially infinite) - Search rule: 1. a-cutoff ==> cut when below any minimizing node that have a b <= a (ancestor). 2, b-cutoff ==> cut when below any maximizing node that have a a >= b (ancestor).

Example max a-cut min max b-cut min 90 89 100 99 60 59 75 74

Alpha-Beta Pruning Algorithm Function ALPHA-BETA( state ) returns an action inputs: state, current state in game v = MAX-VALUE (state, -, +) return the action corresponding with value v Function MAX-VALUE( state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) thenreturnUTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s, ,  ) ) if v >=  then return v  = MAX(, v ) return v

Alpha-Beta Pruning Algorithm Function MIN-VALUE( state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) thenreturnUTILITY( state ) v = + for a, s in SUCCESSORS( state ) do v = MIN( v, MIN-VALUE( s, ,  ) ) if v <=  then return v  = MIN( , v ) return v

Alpha-Beta Pruning Example 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [-, + ] [-, 3] 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [-, + ] [-, 3] 12 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [-, +] [-, 3] 12 8 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [-, +] [3, 3] 12 8 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, +] [3, 3] 12 8 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, +] [3, 3] [-, 2] 12 8 2 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, 14] [3, 3] [-, 2] [-, 14] 12 8 2 3 14 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, 14] [3, 3] [-, 2] [-, 14] 12 8 2 5 3 14 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, 14] [3, 3] [-, 2] [-, 14] 12 8 2 5 3 14 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, 14] [3, 3] [-, 2] [2, 2] 12 8 2 5 3 14 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Alpha-Beta Pruning Example [3, 3] [3, 3] [-, 2] [2, 2] 12 8 2 5 3 14 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Chapter 6 : Game Search 게임 탐색