400 likes | 413 Views
Explore effects of lookahead depth on pathfinding solutions using agent-centered search. Investigate errors and solutions in on-policy and off-policy scenarios.
E N D
Lookahead pathology in real-time pathfinding Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science
Introduction • Problem • Explanation
Agent-centered search (LRTS) Lookahead area Current state Goal state Lookahead depth d
Agent-centered search (LRTS) f = g + h True shortest distance g Estimated shortest distance h Frontier state
Agent-centered search (LRTS) Frontier state with the lowest f (fopt)
Agent-centered search (LRTS) h = fopt
Lookahead pathology • Generally believed that larger lookahead depths produce better solutions • Solution-length pathology: larger lookahead depths produce worse solutions Degree of pathology = 2
Lookahead pathology • Pathology on states that do not form a path • Error pathology: larger lookahead depths produce more suboptimal decisions Degree of pathology = 2 There is pathology
Introduction • Problem • Explanation
Our setting • HOG – Hierarchical Open Graph [Sturtevant et al.] • Maps from commercial computer games (Baldur’s Gate, Warcraft III) • Initial heuristic: octile distance (true distance assuming an empty map) • 1,000 problems (map, start state, goal state)
On-policy experiments • The agent follows a path from the start state to the goal state, updating the heuristic along the way • Solution length and error over the whole path computed for each lookahead depth -> pathology d = 1 d = 2 d = 3
Off-policy experiments • The agent spawns in a number of states • It takes one move towards the goal state • Heuristic not updated • Error is computed from these first moves -> pathology d = 3 d = 1, 2 d = 1 d = 1 d = 2 d = 2, 3 d = 3
Basic on-policy experiment • A lot of pathology – over 60%! • First explanation: a lot of states are intrinsically pathological (off-policy mode) • Not true: only 3.9% are • If the topology of the maps is not at fault, perhaps the algorithm is to blame?
Off-policy experiment on 188 states • Comparison not fair: • On-policy: pathology from error over a number of states • Off-policy: pathologicalness of single states • Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly) • Can use only error – no solution length off-policy • Not much less pathology than on-policy: 42.2% vs. 61.5%
Tolerance • The first off-policy experiment showed little pathology, the second one quite a lot • Perhaps off-policy pathology is caused by minor differences in error – noise • Introduce tolerence t: • increase in error counts towards the pathology only if error (d1) > t ∙ error (d2) • set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09
Experiments with t = 1.09 • On-policy changes little vs. t = 1: 57.7% vs. 61.9% • Apparently on-policy pathology is more severe than off-policy • Investigate why! • The above experiments are the basic on-policy experiment and the basic off-policy experiment
Introduction • Problem • Explanation
Hypothesis 1 • LRTS tends to visit pathological states with an above-average frequency • Test: compute pathology from states visited on-policy instead of 188 random states • More pathology than in random states: 6.3% vs. 4.3% • Much less pathology than basic on-policy: 6.3% vs. 57.7% • Hypothesis 1 is correct, but it is not the main reason for on-policy pathology
Is learning the culprit? • There is learning (updating the heuristic) on-policy, but not off-policy • Learning necessary on-policy, otherwise the agent gets caught in infinite loops • Test: traverse paths in the normal on-policy manner, measure error without learning • Less pathology than basic on-policy: 20.2% vs. 57.7% • Still more pathology than basic off-policy: 20.2% vs. 4.3% • Learning is a reason, although not the only one
Hypothesis 2 • Larger fraction of updated states at smaller depths Current lookahead area Updated state
Hypothesis 2 • Smaller lookahead depths benefit more from learning • This makes their decisions better than the mere depth suggests • Thus they are closer to larger depths • If they are closer to larger depths, cases where a larger depth happens to be worse than a smaller depth are more common • Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning
Uniform learning Search
Uniform learning Update
Uniform learning Search
Uniform learning Update
Pathology with uniform learning • Even more pathology than basic on-policy: 59.1% vs. 57.7% • Is Hypothesis 2 wrong? • Let us look at the volume of heuristic updates encountered per state generated during search • This seems to be the best measure of the benefit of learning
Volume of updates encountered • Hypothesis 2 is correct after all
Hypothesis 3 • On-policy: one search every d moves, so fewer searchs at larger depths • Off-policy: one search every move
Hypothesis 3 • The difference between depths in the amount of search is smaller on-policy than off-policy • This makes the depths closer on-policy • If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common • Test: search every move on-policy
Pathology when searching every move • Less pathology than basic on-policy: 13.1% vs. 57.7% • Still more pathology than basic off-policy: 13.1% vs. 4.3% • Hypothesis 3 is correct, the remaining pathology due to Hypotheses 1 and 2 • Further test: number of states generated per move
States generated / move • Hypothesis 3 confirmed again
Summary of explanation • On-policy pathology caused by different lookahead depths being closer to each other in terms of the quality of decisions than the mere depths would suggest: • due to the volume of heuristic updates ecnountered per state generated • due to the number of states generated per move • LRTS tends to visit pathological states with an above-average frequency
Thank you. Questions?