130 likes | 227 Views
A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?. Learning Probabilistic Hierarchical Task Networks to Capture User Preferences. Nan Li, Subbarao Kambhampati, and Sungwook Yoon
E N D
A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners? Learning Probabilistic Hierarchical Task Networks to Capture User Preferences Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.edu Special Thanks to William Cushing
Two Tales Of HTN Planning • Abstraction • Efficiency • Top-down • Preference handling • Quality • Bottom-up Learning • Most work • Our work
Hitchhike? No way! Learning User Plan Preferences • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8 • Phike: Hitchhike(source, dest) 0
Learning User Preferences as pHTNs • Given a set O of plans executed by the user • Find a generative model, Hl Hl = argmaxH p (O |H) Probabilistic Hierarchical Task Networks (pHTNs) S 0.2, A1 B1 S 0.8, A2 B2 B1 1.0, A2 A3 B2 1.0, A1 A3 A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
LEARNING pHTNs • HTNs can be seen as providing a grammar of desired solutions • Actions Words • Plans Sentences • HTNs Grammar • HTN learning Grammar induction • pHTN learning by probabilistic context free grammar (pCFG) induction • Assumptions: parameter-less, unconditional S 0.2, A1 B1 S 0.8, A2 B2 B1 1.0, A2 A3 B2 1.0, A1 A3 A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
A Two-Step Algorithm • Greedy Structure Hypothesizer: • Hypothesizes the schema structure • Expectation-Maximization (EM) Phase: • Refines schema probabilities • Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
Greedy Structure Hypothesizer • Structure learning • Bottom-up • Prefer recursive to non-recursive
EM Phase • E Step: • Plan parse tree computation • Most probable parse tree • M Step: • Selection probabilities update • s: ai p, aj ak
Evaluation H* P1, P2, … Pn Learner • Ideal: User studies (too hard) • Our approach: • Assume H* represents user preferences • Generate observed plans using H* (H* O) • Learn Hl from O (O Hl) • Compare H* and Hl (H* T*, Hl Tl) • Syntactic similarity is not important, only distribution is • Use KL-Divergence between distributions T*, Tl • KL-Divergence measures distance between distributions • Domains • Randomly Generated • Logistics Planning, Gold Miner Hl
RATE OF LEARNING AND CONCISENESS Randomly Generated Domains Rate of Learning Conciseness More training plans, better schemas. • Small domains, 1 or 2 more non-primitive actions • Large domains, much more non-primitive actions • Refine structure learning?
EFFECTIVENESS OF EM Randomly Generated Domains • Compare greedy schemas with learned schemas • EM step is very effective in capturing user preferences
“BENCHMARK” DOMAINS Logistics Planning Gold Miner • H*: • Move by plane or truck • Prefer plane • Prefer fewer steps • KL Divergence: 0.04 • Recovers • plane > truck • less steps > more steps • H*: • Get the laser cannon • Shoot rock until adjacent to gold • Get a bomb • Use the bomb to remove last wall • KL Divergence: 0.52 • Reproduces basic strategy
Conclusions & Extensions • Learn user plan preferences • Learned HTNs capture preferences rather than domain abstractions • Evaluate predictive power • Compare distributions rather than structure • Preference obfuscation • Poor graduate student who prefers to travel by plane usually travels by car • Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09