320 likes | 495 Views
BDDs in Planning and General Game Playing. Peter Kissmann and Stefan Edelkamp Graph Search Engineering Schloss Dagstuhl 2009. Structure. BDDs Symbolic Search BDDs in Planning Sequential Optimal Planning Net-Benefit Planning Conclusion BDDs in General Game Playing
E N D
BDDs in Planning and General Game Playing Peter Kissmann and Stefan Edelkamp Graph Search Engineering Schloss Dagstuhl 2009
Structure • BDDs • Symbolic Search • BDDs in Planning • Sequential Optimal Planning • Net-Benefit Planning • Conclusion • BDDs in General Game Playing • Solving Single-Player Games • Solving Two-Player Games • Results • Conclusion BDDs in Planning and General Game Playing
BDDs and Symbolic Search Peter Kissmann and Stefan Edelkamp Graph Search Engineering Schloss Dagstuhl 2009
Binary Decision Diagrams (BDDs) • good variable ordering crucial BDDs in Planning and General Game Playing
Symbolic Search • uses (Reduced Ordered) Binary Decision Diagrams ((RO)BDDs) • set-based search: sets of states and transitions represented as relations • unique representation • no duplicate eliminiation within set required • layered exploration (e.g., BFS): duplicate elimination wrt. previous layers • advantages due to compressed representation: • save RAM • might save time BDDs in Planning and General Game Playing
Symbolic Search • two sets of variables • S for current states • S’ for successor states • expansion of state sets (not single states) as relation • calculation of successors: • calculation of predecessors: • predecessors with at least one successor in states: • predecessors with all successors in states: BDDs in Planning and General Game Playing
BDDs in Planning Peter Kissmann and Stefan Edelkamp Graph Search Engineering Schloss Dagstuhl 2009
Structure • Sequential Optimal Planning • Symbolic Algorithms • Competition Results (IPC-6) • Net-Benefit Planning • Symbolic Algorithms • Competition Results (IPC-6) • Conclusion BDDs in Planning and General Game Playing
Sequential Optimal Planning • Given: Problem <S, O, I, T, c> with • S: set of states • O⊆S x S: operators (actions) • I∈S: initial state • T⊆S: terminal states • c: O → {1, …, C}: action costs • Aim: finding of plan from initial state to one of the terminal states • no action costs: minimal plan (in plan‘s length) → Symbolic (Bidir) BFS • with action costs: Symbolic A* (BDDA*) BDDs in Planning and General Game Playing
BDDA* h g BDDs in Planning and General Game Playing
Competition Results (IPC 6) • Extension (of Gamer(comp) to Gamer): • use of hashmap instead of matrix for large action costs • matrix became too large while being sparse BDDs in Planning and General Game Playing
Net-Benefit • challenge at IPC6 • total plan net-benefit = total achieved goal rewards - total action cost • transformation: goal rewards → costs for violating soft constraints • net-benefit = total violating cost + total action cost • to be minimized BDDs in Planning and General Game Playing
Symbolic Branch-and-Bound Search • Symbolic Breadth-First Branch-and-Bound • by Jensen et al. 2006 • cost-optimal BFS → ignores action-costs • improves upper bound U • initially: sum of cost for violating all soft constraints + 1 • can be represented by a BDD: disjunction of all values from 0 to U • Symbolic Cost-First Branch-and-Bound • expansion according to action-costs, not BFS-layers • action-costs still not part of objective function BDDs in Planning and General Game Playing
Symbolic Net-Benefit • adds total action-costs to objective function • net-benefit = (total action-cost f) + (sum of costs for violated soft constraints) • total-cost not bounded • no BDD representation • but: can use cost-first search‘s buckets • also stores current best net-benefit V • initialized to ∞ BDDs in Planning and General Game Playing
Symbolic Net-Benefit • Algorithm: • start with initial state • check, if goals within current states • take only goals with cost < U • find goal with minimal cost U‘ (and U‘ + f < V) and calculate plan • set U = U‘, V = U‘ + f • calculate successors (image) • sort successors into corresponding buckets (f + 1, …, f + C) • repeat from 2., until no new states found (or all soft constraints satisfied or total action cost ≥ V) • return last generated plan BDDs in Planning and General Game Playing
Competition Results (IPC 6) hsp*p: enumerates all possible soft constraint violations and runs ordinary planner on each sub-instance Mips XXL: external-memory algorithms BDDs in Planning and General Game Playing
Conclusion and Additional Remarks • new set-based algorithm for computing optimal net-benefit • covers cost-optimal search and over-subscribed planning with preferences • Gamer can handle 0-cost actions • additional BFS for 0-cost fixpoint calculation • extension to partial initial states BDDs in Planning and General Game Playing
BDDs in General Game Playing Peter Kissmann and Stefan Edelkamp Graph Search Engineering Schloss Dagstuhl 2009
Structure • Solving Single-Player Games • Solving Two-Player Games • Zero-Sum Games • General Two-Player Turn-Taking Games • Results • Conclusion BDDs in Planning and General Game Playing
General Game Playing - Games • Given a description of a game that is • finite • discrete • deterministic • full information • Games can be • single-player or multi-player • simultaneous or turn-taking BDDs in Planning and General Game Playing
Solving Games • In General Game Playing, rewards for all players • range from 0 to 100 (higher = better) • only in goal states • Solving: find rewards for all states (in case of optimal play) • Solving to • analyze players • play optimally • use as endgame database (if not complete) BDDs in Planning and General Game Playing
Solving Single-Player Games • might use Planning technology, but • in Planning (as in General Game Playing) interested in searching only necessary states • here: solve all states • approach: • calculate reachable states • start at goal states giving reward 100 • apply backward BFS • remove all found states from reachable states • go to goal states giving reward 99 and repeat steps BDDs in Planning and General Game Playing
player 0‘s turn player 1‘s turn lost for player 0 lost for player 1 Solving 2-Player Zero-Sum Games • two backward searches (one for each player j∈ {0,1}): • Start with goal states lost for player j • Find all lost predecessors using two steps: • find preceding states where opponent could take move to state lost for j (pre-image) • find preceding states where any of j’s moves results in state lost for j (strong pre-image) • Repeat double-step, until no new states found BDDs in Planning and General Game Playing
Solving General 2-Player Turn-Taking Games • 101x101-matrix of BDDs • BDD at (i, j) represents states achieving reward i for player 0 and j for player 1 (in case of optimal play) • only 1 backward search • alternating between players within loop BDDs in Planning and General Game Playing
Algorithm-Outline • find all reachable states • initialize reward matrix with goal states • solved states: all states within matrix • while (not all states solved) do • for each player j ∈ {0, 1} do • find all solvable states of j (strongPreImage(solved)) • solve these states (pre-image from matrix’s buckets) BDDs in Planning and General Game Playing
own own 0 0 … … 100 100 0 0 … … opponent opponent 100 100 Order to classify states • problem in general case: order to classify states • maximize own reward (and minimize opponent‘s)? • or maximize difference to opponent‘s reward? • might change during one competition • we chose second case for all examples BDDs in Planning and General Game Playing
0/1 player 0 0 1 2 3 0 0/1 0/3 0/1 1 player 1 0/1 3/1 0/3 0/1 2/0 2 3 2/0 2/0 0/3 player 0‘s turn 2/0 0/1 0/1 player 1‘s turn 0/1 3/1 3/1 3/1 3/1 3/1 0/1 0/1 Example BDDs in Planning and General Game Playing
Results (Reachability Analyses) • Single-Player Games: • Two-Player Games: BDDs in Planning and General Game Playing
Results (Peg Solitaire) • total #reachable: 375,110,246 BDDs in Planning and General Game Playing
Results (Connect Four) • 85 bits to represent one state • 2 bits per cell (blank, red, yellow); 42 cells • 1 bit for active player • originally solved by Allis (’88) • estimate on total #states: 70,728,639,995,483 ≈ 70 x 1012 • complete reachability analysis using BDDs • 12 GB RAM • 2.67 GHz CPU • total time: 5:15 h • total #states: 4,531,985,219,092 ≈ 4.5 x 1012 • explicit representation: ≈ 43.5TB BDDs in Planning and General Game Playing
Results (Two-Player Games) BDDs in Planning and General Game Playing
Conclusion • Solving single-player games and two-player zero-sum games fairly easy • Solving general two-player games involved • first approach (Planning & Games Workshop 2007) very slow • current one needs linear number of pre-images • for playing still too slow • UCT to get good estimates faster • UCT works well with endgame databases • BDDs for complete state space can be used as perfect hash-functions BDDs in Planning and General Game Playing