Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners? Learning Probabilistic Hierarchical Task Networks to Capture User Preferences Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.edu Special Thanks to William Cushing

Two Tales Of HTN Planning • Abstraction • Efficiency • Top-down • Preference handling • Quality • Bottom-up Learning • Most work • Our work

Hitchhike? No way! Learning User Plan Preferences • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8 • Phike: Hitchhike(source, dest) 0

Learning User Preferences as pHTNs • Given a set O of plans executed by the user • Find a generative model, Hl Hl = argmaxH p (O |H) Probabilistic Hierarchical Task Networks (pHTNs) S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

LEARNING pHTNs • HTNs can be seen as providing a grammar of desired solutions • Actions  Words • Plans  Sentences • HTNs  Grammar • HTN learning  Grammar induction • pHTN learning by probabilistic context free grammar (pCFG) induction • Assumptions: parameter-less, unconditional S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

A Two-Step Algorithm • Greedy Structure Hypothesizer: • Hypothesizes the schema structure • Expectation-Maximization (EM) Phase: • Refines schema probabilities • Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

Greedy Structure Hypothesizer • Structure learning • Bottom-up • Prefer recursive to non-recursive

EM Phase • E Step: • Plan parse tree computation • Most probable parse tree • M Step: • Selection probabilities update • s: ai p, aj ak

Evaluation H* P1, P2, … Pn Learner • Ideal: User studies (too hard) • Our approach: • Assume H* represents user preferences • Generate observed plans using H* (H*  O) • Learn Hl from O (O  Hl) • Compare H* and Hl (H*  T*, Hl Tl) • Syntactic similarity is not important, only distribution is • Use KL-Divergence between distributions T*, Tl • KL-Divergence measures distance between distributions • Domains • Randomly Generated • Logistics Planning, Gold Miner Hl

RATE OF LEARNING AND CONCISENESS Randomly Generated Domains Rate of Learning Conciseness More training plans, better schemas. • Small domains, 1 or 2 more non-primitive actions • Large domains, much more non-primitive actions • Refine structure learning?

EFFECTIVENESS OF EM Randomly Generated Domains • Compare greedy schemas with learned schemas • EM step is very effective in capturing user preferences

“BENCHMARK” DOMAINS Logistics Planning Gold Miner • H*: • Move by plane or truck • Prefer plane • Prefer fewer steps • KL Divergence: 0.04 • Recovers • plane > truck • less steps > more steps • H*: • Get the laser cannon • Shoot rock until adjacent to gold • Get a bomb • Use the bomb to remove last wall • KL Divergence: 0.52 • Reproduces basic strategy

Conclusions & Extensions • Learn user plan preferences • Learned HTNs capture preferences rather than domain abstractions • Evaluate predictive power • Compare distributions rather than structure • Preference obfuscation • Poor graduate student who prefers to travel by plane usually travels by car • Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

Presentation Transcript

Practical Planning: Scheduling and Hierarchical Task Networks

Hierarchical Reinforcement Learning

Learning to Question: Leveraging User Preferences for Shopping Advice

Learning styles Learning Preferences Learning Strategies

Probabilistic Networks

Introduction to Hierarchical Reinforcement Learning

Learning Preferences

Learning to Question: Leveraging User Preferences for Shopping Advice

Hierarchical Task Networks

Control System Studio Training - Hierarchical Preferences

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS

Hierarchical Reinforcement Learning

Learning User Preferences

iSite 3.5: User Preferences

Approaches to Modeling and Learning User Preferences

Hierarchical Task Network (HTN) Planning

Hierarchical Task analysis

Hierarchical Reinforcement Learning

Learning Style Preferences