500 likes | 758 Views
Heuristics for Planning. Initial project advisors assigned—please contact them Project 1 ready; due 2/14 Due to underwhelming interest, note-taking has been suspended. Before, planning algorithms could synthesize about 6 – 10 action plans in minutes
E N D
Heuristics for Planning Initial project advisors assigned—please contact them Project 1 ready; due 2/14 Due to underwhelming interest, note-taking has been suspended
Before, planning algorithms could synthesize about 6 – 10 action plans in minutes Significant scale-up in the last 6-7 years Now, we can synthesize 100 action plans in seconds. Realistic encodings of Munich airport! Scalability of Planning Problem is Search Control!!! The primary revolution in planning in the recent years has been domain-independent heuristics to scale up plan synthesis IJCAI'07 Tutorial T12
Motivation • Ways to improve Planner Scalability • Problem Formulation • Search Space • Reachability Heuristics • Domain (Formulation) Independent • Work for many search spaces • Flexible – work with most domain features • Overall complement other scalability techniques • Effective!! IJCAI'07 Tutorial T12
Rover Domain IJCAI'07 Tutorial T12
Classical Planning • Relaxed Reachability Analysis • Types of Heuristics • Level-based • Relaxed Plans • Mutexes • Heuristic Search • Progression • Regression • Plan Space • Exploiting Heuristics IJCAI'07 Tutorial T12
Planning Graph and Search Tree • Envelope of Progression Tree (Relaxed Progression) • Proposition lists: Union of states at kth level • Lowerbound reachability information IJCAI'07 Tutorial T12
Level Based Heuristics • The distance of a proposition is the index of the first proposition layer in which it appears • Proposition distance changes when we propagate cost functions – described later • What is the distance of a Set of propositions?? • Set-Level: Index of first proposition layer where all goal propositions appear • Admissible • Gets better with mutexes, otherwise same as max • Max: Maximum distance proposition • Sum: Summation of proposition distances IJCAI'07 Tutorial T12
Example of Level Based Heuristics set-level(sI, G) = 3 max(sI, G) = max(2, 3, 3) = 3 sum(sI, G) = 2 + 3 + 3 = 8 IJCAI'07 Tutorial T12
How do Level-Based Heuristics Break? P1 P2 P0 A0 A1 q q q B1 B1 p1 p1 B2 B2 p2 p2 B3 B3 p3 p3 B99 p99 p99 B99 B100 p100 p100 B100 g A The goal g is reached at level 2, but requires 101 actions to support it. IJCAI'07 Tutorial T12
Relaxed Plan Heuristics • When Level does not reflect distance well, we can find a relaxed plan. • A relaxed plan is subgraph of the planning graph, where: • Every goal proposition is in the relaxed plan at the last level • Every proposition in the relaxed plan has a supporting action in the relaxed plan • Every action in the relaxed plan has its preconditions supported. • Relaxed Plans are not admissible, but are generally effective. • Finding the optimal relaxed plan is NP-hard, but finding a greedy one is easy. Later we will see how “greedy” can change. • Later, in over-subscription planning, we’ll see that the effort of finding an optimal relaxed plan is worthwhile IJCAI'07 Tutorial T12
Example of Relaxed Plan Heuristic Count Actions RP(sI, G) = 8 Support Goal Propositions Individually Identify Goal Propositions IJCAI'07 Tutorial T12
Results Relaxed Plan Level-Based IJCAI'07 Tutorial T12
Optimizations in Heuristic Computation • Taming Space/Time costs • Bi-level Planning Graph representation • Partial expansion of the PG (stop before level-off) • It is FINE to cut corners when using PG for heuristics (instead of search)!! • Branching factor can still be quite high • Use actions appearing in the PG (complete) • Select actions in lev(S)vs Levels-off (incomplete) • Consider action appearing in RP (incomplete) IJCAI'07 Tutorial T12
Adjusting for Negative Interactions • Until now we assumed actions only positively interact. What about negative interactions? • Mutexes help us capture some negative interactions • Types • Actions: Interference/Competing Needs • Propositions: Inconsistent Support • Binary are the most common and practical • |A| + 2|P|-ary will allow us to solve the planning problem with a backtrack-free GraphPlan search • An action layer may have |A| actions and 2|P| noops • Serial Planning Graph assumes all non-noop actions are mutex IJCAI'07 Tutorial T12
Binary Mutexes have(soil) has a supporter not mutex with a supporter of at(), --feasible together at level 2-- P0 A0 P1 A1 P2 avail(soil, ) avail(soil, ) avail(soil, ) sample(soil, ) sample(soil, ) have(soil) only supporter is mutex with at() only supporter --Inconsistent Support-- avail(rock, ) avail(rock, ) avail(rock, ) drive(, ) drive(, ) avail(image,) avail(image, ) avail(image, ) drive(, ) drive(, ) at() at() at() sample needs at(), drive negates at() --Interference-- sample(image,) have(soil) have(soil) sample(rock, ) have(image) commun(soil) have(rock) drive(, ) at() at() Set-Level(sI, {at(),.have(soil)}) = 2 Max(sI, {at(),. have(soil)}) = 1 drive(, ) comm(soil) drive(, ) at() at() IJCAI'07 Tutorial T12 drive(, )
Admissible Set-Level with memos Partition-k Adjusted Sum Combo Distance of a Set of Literals • lev(p) :index of the first level at which p comes into the planning graph • lev(S):index of the first level where all props in S appear non-mutexed. • If there is no such level, then If the graph is grown to level off, then ¥ Else k+1 (k is the current length of the graph) h(S) =pS lev({p}) h(S) = lev(S) Sum Set-Level IJCAI'07 Tutorial T12
Start with RP heuristic and adjust it to take subgoal interactions into account Negative interactions in terms of “degree of interaction” Positive interactions in terms of co-achievement links Ignore negative interactions when accounting for positive interactions (and vice versa) It is NP-hard to find a plan (when there are mutexes), and even NP-hard to find an optimal relaxed plan. It is easier to add a penalty to the heuristic for ignored mutexes Adjusting the Relaxed Plans [AAAI 2000] HAdjSum2M(S)= length(RelaxedPlan(S)) + max p,qS (p,q) Where (p,q) = lev({p,q}) – max{lev(p), lev(q)} /*Degree of –ve Interaction */ IJCAI'07 Tutorial T12
Anatomy of a State-space Regression planner Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] IJCAI'07 Tutorial T12
Rover Example in Regression comm(soil) avail(rock, ) avail(image, ) at() at() comm(soil) avail(rock, ) have(image) at() s4 s3 s2 s1 sG comm(soil) have(rock) have(image) comm(soil) comm(rock) have(image) comm(soil) comm(rock) comm(image) commun(rock) commun(image) sample(rock, ) sample(image, ) Should be ∞, s4 is inconsistent, how do we improve the heuristic?? Sum(sI, s4) = 0+0+1+1+2 =4 Sum(sI, s3) = 0+1+2+2 =5 IJCAI'07 Tutorial T12
AltAlt Performance Level-based Adjusted RP Logistics Scheduling Problem sets from IPC 2000 IJCAI'07 Tutorial T12
Then it was cruelly UnPOPped In the beginning it was all POP. The good times return with Re(vived)POP Plan Space Search IJCAI'07 Tutorial T12
POP Algorithm • Plan Selection: Select a plan P from • the search queue • 2. Flaw Selection: Choose a flaw f • (open cond or unsafe link) • 3. Flaw resolution: • If f is an open condition, • choose an action S that achieves f • If f is an unsafe link, • choose promotion or demotion • Update P • Return NULL if no resolution exist • 4. If there is no flaw left, return P g1 g2 1. Initial plan: Sinf S0 2. Plan refinement (flaw selection and resolution): p q1 S1 S3 g1 Sinf S0 g2 g2 oc1 oc2 S2 ~p • Choice points • Flaw selection (open condition? unsafe link? Non-backtrack choice) • Flaw resolution/Plan Selection (how to select (rank) partial plan?) IJCAI'07 Tutorial T12
Distance heuristics to estimate cost of partially ordered plans (and to select flaws) If we ignore negative interactions, then the set of open conditions can be seen as a regression state Mutexes used to detect indirect conflicts in partial plans A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition Post disjunctive precedences and use propagation to simplify PG Heuristics for Partial Order Planning IJCAI'07 Tutorial T12
avail(rock, ) have(rock) S2 S3 at() have(rock) comm(rock) commun(rock) sample(rock, ) S0 at() comm(soil) S1 have(image) avail(soil,) S1 comm(image) comm(rock) comm(image) avail(rock,) avail(image, ) commun(image) have(image) S1 comm(image) commun(image) Regression and Plan Space comm(soil) avail(rock, ) avail(image, ) at() at() comm(soil) avail(rock, ) have(image) at() s4 s3 s2 s1 sG comm(soil) have(rock) have(image) comm(soil) comm(rock) have(image) comm(soil) comm(rock) comm(image) commun(rock) commun(image) sample(rock, ) sample(image, ) avail(rock, ) have(rock) S2 S3 at() S0 have(rock) comm(rock) at() comm(soil) S1 commun(rock) sample(rock, ) avail(soil,) comm(rock) comm(image) avail(rock,) avail(image, ) avail(image, ) S4 at() IJCAI'07 Tutorial T12 have(image) sample(image, )
RePOP’s Performance • RePOP implemented on top of UCPOP • Dramatically better than any other partial order planner before it • Competitive with Graphplan and AltAlt • VHPOP carried the torch at ICP 2002 • Written in Lisp, runs on Linux, 500MHz, 250MB • You see, pop, it is possible to Re-use all the old POP work! [IJCAI, 2001] IJCAI'07 Tutorial T12
Exploiting Planning Graphs • Restricting Action Choice • Use actions from: • Last level before level off (complete) • Last level before goals (incomplete) • First Level of Relaxed Plan (incomplete) – FF’s helpful actions • Only action sequences in the relaxed plan (incomplete) – YAHSP • Reducing State Representation • Remove static propositions. A static proposition is only ever true or false in the last proposition layer. IJCAI'07 Tutorial T12
Classical Planning Conclusions • Many Heuristics • Set-Level, Max, Sum, Relaxed Plans • Heuristics can be improved by adjustments • Mutexes • Useful for many types of search • Progresssion, Regression, POCL IJCAI'07 Tutorial T12
Cost-Based Planning IJCAI'07 Tutorial T12
Cost-based Planning • Propagating Cost Functions • Cost-based Heuristics • Generalized Level-based heuristics • Relaxed Plan heuristics IJCAI'07 Tutorial T12
Rover Cost Model IJCAI'07 Tutorial T12
Cost Propagation P0 A0 P1 A1 P2 avail(soil, ) avail(soil, ) avail(soil, ) 20 20 sample(soil, ) sample(soil, ) avail(rock, ) avail(rock, ) avail(rock, ) 10 10 drive(, ) drive(, ) avail(image, ) avail(image, ) avail(image,) 30 30 drive(, ) drive(, ) at() at() at() 35 20 20 sample(image,) have(soil) have(soil) 35 35 sample(rock, ) have(image) 25 35 commun(soil) have(rock) 25 10 10 drive(, ) at() at() Cost Reduces because Of different supporter At a later level 15 25 drive(, ) comm(soil) 35 30 25 drive(, ) at() at() IJCAI'07 Tutorial T12 40 drive(, )
1-lookahead Cost Propagation (cont’d) P2 A2 P3 A3 P4 avail(soil, ) avail(soil, ) avail(soil, ) 20 20 sample(soil, ) sample(soil, ) avail(rock, ) avail(rock, ) avail(rock, ) 10 10 drive(, ) drive(, ) avail(image, ) avail(image, ) avail(image, ) 30 30 drive(, ) drive(, ) at() at() at() 30 30 sample(image,) 20 20 20 sample(image,) have(soil) have(soil) have(soil) 35 35 35 30 30 sample(rock, ) sample(rock, ) have(image) have(image) have(image) 25 25 35 35 35 commun(soil) commun(soil) have(rock) have(rock) have(rock) 25 25 10 10 10 drive(, ) drive(, ) at() at() at() 15 15 25 25 25 drive(, ) drive(, ) comm(soil) comm(soil) comm(soil) 30 30 25 25 25 drive(, ) drive(, ) at() at() at() 40 40 40 35 drive(, ) drive(, ) comm(image) comm(image) 35 40 40 commun(image) 40 IJCAI'07 Tutorial T12 commun(image) comm(rock) comm(rock) 40 40 commun(rock) commun(rock)
Terminating Cost Propagation • Stop when: • goals are reached (no-lookahead) • costs stop changing (∞-lookahead) • k levels after goals are reached (k-lookahead) IJCAI'07 Tutorial T12
Guiding Relaxed Plans with Costs Start Extract at last level (goal proposition is cheapest) IJCAI'07 Tutorial T12
Cost-Based Planning Conclusions • Cost-Functions: • Remove false assumption that level is correlated with cost • Improve planning with non-uniform cost actions • Are cheap to compute (constant overhead) IJCAI'07 Tutorial T12
Partial Satisfaction (Over-Subscription) Planning IJCAI'07 Tutorial T12
Partial Satisfaction Planning • Selecting Goal Sets • Estimating goal benefit • Anytime goal set selection • Adjusting for negative interactions between goals IJCAI'07 Tutorial T12
Partial Satisfaction (Oversubscription) Planning In many real world planning tasks, the agent often has more goals than it has resources to accomplish. Example: Rover Mission Planning (MER) Need automated support for Over-subscription/Partial Satisfaction Planning Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit. IJCAI'07 Tutorial T12
Adapting PG heuristics for PSP • Challenges: • Need to propagate costs on the planning graph • The exact set of goals are not clear • Interactions between goals • Obvious approach of considering all 2n goal subsets is infeasible • Idea: Select a subset of the top level goals upfront • Challenge: Goal interactions • Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan • Bias the relaxed plan extraction to (re)use the actions already chosen for other goals IJCAI'07 Tutorial T12
Goal Set Selection In Rover Problem Found By Cost Propagation Found By RP 20-35 = -15 60-40 = 20 50-25 = 25 comm(soil) comm(rock) comm(image) Found By Biased RP 50-25 = 25 110-65 = 45 70-60 = 20 comm(rock) comm(image) 130-100 = 30 comm(image) IJCAI'07 Tutorial T12
SAPAPS (anytime goal selection) • A* Progression search • g-value: net-benefit of plan so far • h-value: relaxed plan estimate of best goal set • Relaxed plan found for all goals • Iterative goal removal, until net benefit does not increase • Returns plans with increasing g-values. IJCAI'07 Tutorial T12
Some Empirical Results for AltAltps Exact algorithms based on MDPs don’t scale at all [AAAI 2004] IJCAI'07 Tutorial T12
Adjusting for Negative Interactions (AltWlt) • Problem: • What if the apriori goal set is not achievable because of negative interactions? • What if greedy algorithm gets bad local optimum? • Solution: • Do not consider mutex goals • Add penalty for goals whose relaxed plan has mutexes. • Use interaction factor to adjust cost, similar to adjusted sum heuristic • maxg1, g2 2 G {lev(g1, g2) – max(lev(g1), lev(g2)) } • Find Best Goal set for each goal IJCAI'07 Tutorial T12
Goal Utility Dependencies BAD GOOD One Shoe Two Shoes Cost: 50 Utility: 0 Cost: 100 Utility: 500 High-res photo utility: 150 Soil sample utility: 100 Both goals: Additional 200 High-res photo utility: 150 Low-res photo utility: 100 Both goals: Remove 80 IJCAI'07 Tutorial T12
Dependencies goal interactions exist as two distinct types cost dependencies utility dependencies • Goals share actions in the plan trajectory • Defined by the plan • Goals may interact in the utility they give • Explicitly defined by user See talk Thurs. (11th) in session G4 (Planning and Scheduling II) Exists in classical planning, but is a larger issue in PSP No need to consider this in classical planning All PSP Planners SPUDS AltWlt, Optiplan, SapaPS Our planner based on SapaPS IJCAI'07 Tutorial T12
PSP Conclusions • Goal Set Selection • Apriori for Regression Search • Anytime for Progression Search • Both types of search use greedy goal insertion/removal to optimize net-benefit of relaxed plans IJCAI'07 Tutorial T12
Overall Conclusions • Relaxed Reachability Analysis • Concentrate strongly on positive interactions and independence by ignoring negative interaction • Estimates improve with more negative interactions • Heuristics can estimate and aggregate costs of goals or find relaxed plans • Propagate numeric information to adjust estimates • Cost, Resources, Probability, Time • Solving hybrid problems is hard • Extra Approximations • Phased Relaxation • Adjustments/Penalties IJCAI'07 Tutorial T12
Why do we love PG Heuristics? • They work! • They are “forgiving” • You don't like doing mutex? okay • You don't like growing the graph all the way? okay. • Allow propagation of many types of information • Level, subgoal interaction, time, cost, world support, probability • Support phased relaxation • E.g. Ignore mutexes and resources and bring them back later… • Graph structure supports other synergistic uses • e.g. action selection • Versatility… IJCAI'07 Tutorial T12
PG Variations Serial Parallel Temporal Labelled Propagation Methods Level Mutex Cost Label Versatility of PG Heuristics • Planning Problems • Classical • Resource/Temporal • Conformant • Planners • Regression • Progression • Partial Order • Graphplan-style IJCAI'07 Tutorial T12