570 likes | 586 Views
Lookahead pathology in real-time pathfinding. Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science. Introduction Problem Explanation Remedy. Real-time single-agent heuristic search. Task:
E N D
Lookahead pathology in real-time pathfinding Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science
Introduction • Problem • Explanation • Remedy
Real-time single-agent heuristic search • Task: • find a path from a start state to a goal state • Complete search: • plan the whole path to the goal state • execute the plan • example: A* [Hart et al. 68] • good: given an admissible heuristic, the path is optimal • bad: the delay before the first move can be large
Real-time single-agent heuristic search • Incomplete search: • plan a part of the path to the goal • execute the plan • repeat • example: LRTA* [Korf 90], LRTS [Bulitko & Lee 06] • good: delay before the first move small, amount of planning per move bounded • bad: the path is typically not optimal
Why do we need it? • Picture a real-time strategy game • The user commands dozens of units to move towards a distant goal • Complete search would have to compute the whole paths for all of them • Incomplete search computes just the first couple of steps
Heuristic lookahead search Lookahead area Current state Goal state Lookahead depth d
Heuristic lookahead search f = g + h True shortest distance g Estimated shortest distance h Frontier state
Heuristic lookahead search Frontier state with the lowest f (fopt)
Heuristic lookahead search h = fopt
Lookahead pathology • Generally believed that larger lookahead depths produce better solutions • Solution-length pathology: larger lookahead depths produce worse solutions Degree of pathology = 2
Lookahead pathology • Pathology on states that do not form a path • Error pathology: larger lookahead depths produce more suboptimal decisions Degree of pathology = 2 There is pathology
Related: minimax pathology • Minimax backs up heuristic values from the leaves of the game tree to the root • Attempts to explain why backed-up heuristic values are better than static values • Theoretical analyses show that they are worse – pathology [Nau 79, Beal 80] • Explanations: • similarity of nearby positions in real games • realistic modeling of error • ... • Focus on why the pathology doesnot appear in practice
Related: pathology in single-agent search • Discovered on synthetic search trees [Bulitko et al. 03] • Observed in eight puzzle [Bulitko 03] • appears with different evaluation functions • shown that the benefit from knowing the optimal lookahead depth is large • Explained on synthetic search trees [Luštrek 05] • caused by certain properties of trees • caused by inconsistent and inadmissible heuristics • Unexplored in pathfinding
Introduction • Problem • Explanation • Remedy
Our setting • HOG – Hierarchical Open Graph [Sturtevant et al.] • Maps from commercial computer games (Baldur’s Gate, Warcraft III) • Initial heuristic: octile distance (true distance assuming an empty map) • 1,000 problems (map, start state, goal state)
On-policy experiments • The agent follows a path from the start state to the goal state, updating the heuristic along the way • Solution length and error over the whole path computed for each lookahead depth -> pathology d = 1 d = 2 d = 3
Off-policy experiments • The agent spawns in a number of states • It takes one move towards the goal state • Heuristic not updated • Error is computed from these first moves -> pathology d = 3 d = 1, 2 d = 1 d = 1 d = 2 d = 2, 3 d = 3
Basic on-policy experiment • A lot of pathology – over 60%! • First explanation: a lot of states are intrinsically pathological (off-policy mode) • Not true: only 3.9% are • If the topology of the maps is not at fault, perhaps the algorithm is to blame?
Off-policy experiment on 188 states • Comparison not fair: • On-policy: pathology from error over a number of states • Off-policy: pathologicalness of single states • Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly) • Can use only error – no solution length off-policy • Not much less pathology than on-policy: 42.2% vs. 61.5%
Tolerance • The first off-policy experiment showed little pathology, the second one quite a lot • Perhaps off-policy pathology is caused by minor differences in error – noise • Introduce tolerence t: • increase in error counts towards the pathology only if error (d1) > t ∙ error (d2) • set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09
Experiments with t = 1.09 • On-policy changes little vs. t = 1: 57.7% vs. 61.9% • Apparently on-policy pathology is more severe than off-policy • Investigate why! • The above experiments are the basic on-policy experiment and the basic off-policy experiment
Introduction • Problem • Explanation • Remedy
Hypothesis 1 • LRTS tends to visit pathological states with an above-average frequency • Test: compute pathology from states visited on-policy instead of 188 random states • More pathology than in random states: 6.3% vs. 4.3% • Much less pathology than basic on-policy: 6.3% vs. 57.7% • Hypothesis 1 is correct, but it is not the main reason for on-policy pathology
Is learning the culprit? • There is learning (updating the heuristic) on-policy, but not off-policy • Learning necessary on-policy, otherwise the agent gets caught in infinite loops • Test: traverse paths in the normal on-policy manner, measure error without learning • Less pathology than basic on-policy: 20.2% vs. 57.7% • Still more pathology than basic off-policy: 20.2% vs. 4.3% • Learning is a reason, although not the only one
Hypothesis 2 • Larger fraction of updated states at smaller depths Current lookahead area Updated state
Hypothesis 2 • Smaller lookahead depths benefit more from learning • This makes their decisions better than the mere depth suggests • Thus they are closer to larger depths • If they are closer to larger depths, cases where a larger depth happens to be worse than a smaller depth are more common • Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning
Uniform learning Search
Uniform learning Update
Uniform learning Search
Uniform learning Update
Pathology with uniform learning • Even more pathology than basic on-policy: 59.1% vs. 57.7% • Is Hypothesis 2 wrong? • Let us look at the volume of heuristic updates encountered per state generated during search • This seems to be the best measure of the benefit of learning
Volume of updates encountered • Hypothesis 2 is correct after all
Consistency • Initial heuristic is consistent • the difference in heuristic value between two states does not exceed the actual shortest distance between them • Updates make it inconsistent • Research on synthetic trees showed inconsistency causes pathology [Luštrek 05] • Uniform learning preserves consistency • It is more pathological than regular learning • Consistency is not a problem in our case
Hypothesis 3 • On-policy: one search every d moves, so fewer searchs at larger depths • Off-policy: one search every move
Hypothesis 3 • The difference between depths in the amount of search is smaller on-policy than off-policy • This makes the depths closer on-policy • If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common • Test: search every move on-policy
Pathology when searching every move • Less pathology than basic on-policy: 13.1% vs. 57.7% • Still more pathology than basic off-policy: 13.1% vs. 4.3% • Hypothesis 3 is correct, the remaining pathology due to Hypotheses 1 and 2 • Further test: number of states generated per move
States generated / move • Hypothesis 3 confirmed again
Summary of explanation • On-policy pathology caused by different lookahead depths being closer to each other in terms of the quality of decisions than the mere depths would suggest: • due to the volume of heuristic updates ecnountered per state generated • due to the number of states generated per move • LRTS tends to visit pathological states with an above-average frequency
Introduction • Problem • Explanation • Remedy
Is a remedy worth looking for? • Optimal lookahead depth selected for each problem: • Solution length = 107.9 • States generated / move = 73.6 • The answer is yes – solution length improved by 38.5%
What can we do? • House + garden • Precompute the optimal depth for every start state
Optimal depth per start state • Optimal lookahead depth selected for each start state: • Solution length: 132.4 • States generated / move: 59.3 • Similar to 1,000 problems – map representative