300 likes | 312 Views
This study focuses on evolving heuristics for searching games using genetic programming. It explores topics such as state-graph representation, uninformed search, heuristics, and informed search. The study also discusses the application of genetic algorithms and genetic programming to evolve heuristics for Rush Hour and FreeCell games.
E N D
Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf
Overview • Introduction • Searching Games State-Graphs • Uninformed Search • Heuristics • Informed Search • Evolving Heuristics • Previous Work • Rush Hour • FreeCell
Representing Games as State-Graphs • Every puzzle/game can be represented as a state graph: • In puzzles, board games etc., every piece move can be counted as a different state • In computer war games etc. – the place of the player / the enemy, all the parameters (health, shield…) define a state
Searching Games State-GraphsUninformed Search • BFS – Exponential in the search depth • DFS – Linear in the length of the current search path. BUT: • We might “never” track down the right path. • Usually games contain cycles • Iterative Deepening: Combination of BFS & DFS • Each iteration DFS with a depth limit is performed. • Limit grows from one iteration to another • Worst case - traverse the entire graph
Searching Games State-GraphsUninformed Search • Most of the game domains are PSPACE-Complete! • Worst case - traverse the entire graph • We need an informed-search!
Searching Games State-GraphsHeuristics • h:states -> Real. • For every state s, h(s) is an estimation of the minimal distance/cost from s to a solution • h is perfect: an informed search that tries states with highest h-score first – will simply stroll to solution • For hard problems, finding h is hard • Bad heuristic means the search might never track down the solution • We need a good heuristic function to guide informed search
Searching Games State-Graphs Informed Search • Best-First search: Like DFS but select nodes with higher heuristic value first • Not necessarily optimal • Might enter cycles (local extremum) • A*: • Holds closed and sorted (by h-value) open lists. Best node of all open nodes is selected • Maintenance and size of open and closed is not admissible
Searching Games State-Graphs Informed Search (Cont.) • IDA*: Iterative-Deepening with A* • The expanded nodes are pushed to the DFS stack by descending heuristic values • Let g(si) be the min depth of state si: Only nodes with f(s)=g(s)+h(s)<depth-limit are visited • Near optimal solution (depends on path-limit) • The heuristic need to be admissible
Overview • Introduction • Searching Games State-Graphs • Uninformed Search • Heuristics • Informed Search • Evolving Heuristics • Previous Work • Rush Hour • FreeCell
Evolving Heuristics • For H1, … ,Hn – building blocks (not necessarily admissible or in the same range),How should we choose the fittest heuristic? • Minimum? Maximum? Linear combination? • GA/GP may be used for: • Building new heuristics from existing building blocks • Finding weights for each heuristic (for applying linear combination) • Finding conditions for applying each heuristic • H should probably fit stage of search • E.g., “goal” heuristics when assuming we’re close
Evolving Heuristics: GA • Genotype – • Phenotype –
Evolving Heuristics: GP If False Condition True * And + H5 / H2 * ≤ ≥ H1 0.1 H1 0.1 H1 0.4 H2 0.7
Overview • Introduction • Searching Games State-Graphs • Uninformed Search • Heuristics • Informed Search • Evolving Heuristics • Previous Work • Rush Hour • FreeCell
Rush Hour GP-Rush [Hauptman et al, 2009] Bronze Humie award
Domain-Specific Heuristics • Hand-Crafted Heuristics / Guides: • Blocker estimation – lower bound (admissible) • Goal distance – Manhattan distance • Hybrid blockers distance – combine above two • Is Move To Secluded – did the car enter a secluded area? • Is Releasing Move
Policy “Ingredients” Functions & Terminals:
Coevolving (Hard) 8x8 Boards G F G G F F S S S H H H I I I RED RED RED M M M K K K K K K K K K K K K P P P
Results Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search:
Results (cont’d) Time (in seconds) required to solve problems JAM01 . . . JAM40:
FreeCell FreeCell remained relatively obscure until Windows 95 There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, whichhas been proven to be unsolvable Evolving hyper heuristic-based solvers for Rush-Hour and FreeCell [Hauptman et al, SOCS 2010] GA-FreeCell: Evolving Solvers for the Game of FreeCell [Elyasaf et al, GECCO 2011]
FreeCell (cont’d) • As opposed to Rush Hour, blind search failed miserably • The best published solver to date solves 96% of Microsoft 32K • Reasons: • High branching factor • Hard to generate a good heuristic
Learning Methods: Random Deals Which deals should we use for training? First method tested - random deals • This is what we did in Rush Hour • Here it yielded poor results • Very hard domain
Learning Methods: Gradual Difficulty Second method tested - gradual difficulty • Sort the problems by difficulty • Each generation test solvers against 5 deals from the current difficulty level + 1 random deal
Learning Methods: Hillis-Style Coevolution Third method tested - Hillis-style coevolution using “Hall-of-Fame”: • A deal population is composed of 40 deals (=40 individuals) + 10 deals that represent a hall-of-fame • Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals • Evolved hyper-heuristics failed to solve almost all Microsoft 32K! Why?
Learning Methods: Rosin-style Coevolution p1 p2 Fourth method tested - Rosin-style coevolution: • Each deal individual consists of 6 deals • Mutation and crossover: p1
Thank you for listening any questions?