350 likes | 547 Views
Metric/Temporal Planning. Metric Temporal Planning. MTP Adds time and resources to planning. Special cases: TP: Temporal planning RP: Resource Planning. Issues with Time. Changes brought by the introduction of time into Planning can be grouped into two categories
E N D
Metric Temporal Planning • MTP Adds time and resources to planning Special cases: TP: Temporal planning RP: Resource Planning
Issues with Time • Changes brought by the introduction of time into Planning can be grouped into two categories • Changes brought by having a metric (clock) time • I.e., there is a clock with respect to which we can specify events • Changes brought by durations to actions • I.e., actions are not instantaneous • Without metric time, a plan has just a beginning and ending point. Metric time allows us to talk about all time points (and intervals) during the execution of the plan. Changes brought by metric time include • Exogenous events • Special case: Timed initial literals (in the initial state we can state that some fluent becomes true at a specific time in the future during execution) • Deadline goals • We can state that different goals need to be made true by different deadline times (instead of all goals being true at the end) • Durative goals • We can state that a certain fluent must have a specific value over an entire interval
Issues with Time (contd.) • Durations of actions may be static or “dynamic” • duration depends on the context—eg. Time to fill your gas tank depends on how empty the tank is to begin with • Advanced issues: Uncertain durations… • With instantaneous actions, an action has just “before” and “after” –preconditions must hold “before” and effects will hold “after. Durative actions have before, after as well as “during”. With metric time (i.e., external clock), we can refer to all these points. We can now ask: • When are preconditions needed? • Are they needed at a single point or over a duration? • When are effects given? Are they point effects or “durative” effects (which are guaranteed over a certain duration)? • Note that because actions have durations, they can have multiple effects on a single fluent at different times • E.g. the action can make fluent P true at start, false after 10 sec, true again after another 10 sec etc. • A default assumption is to say that all preconditions are needed at the beginning and must hold during the entire action’s duration. And that all effects will be available at the end of the action • E.g Consider “Grading homeworks” action—when are the homeworks needed? When are the grades available? What does your teacher tell you?
Issues with Time (contd.) • Durative actions bring more pointed meaning to “concurrency”. • Concurrency is not just a luxury (to reduce make-span), but is often a necessity (e.g. burn a match, and cross the dark corridor while the match is burning..) • Suppose I tell you that a plan P contains actions A1… A10, each with duration d1…d10, then what is the makespan (execution duration) of P? • Makespan(P) >= max(d1…d10) • If Makespan(P) = Sum(d1…d10), then it is a strictly serial plan • If Makespan(P) > Sum(d1..d10), then there is idle-time in the plan • If Makespan(P) < Sum(d1..d10), then there is concurrency • Actions don’t need to start right after the preceding action • Think of the bank teller gossiping with his colleague in between servicing each customer • Planned idle/slack time may not always be a bad thing—it can sometimes improve the robustness of the plan • Think of three travel plans involving connections in Minneapolis: Plan 1 schedules 5 min for connection time; plan 2 schedules 1 hour; plan 3 schedules 2 days. Which one is better (all else being equal).
Issues with Resources (continuous quantities) • Resources: Actions may consume/produce (continuous quantity) “resources” • The main consequence is that we have numeric state variables, instead of just boolean (or multi-valued) ones (multi-valued does not mean numeric—a variable can take red,blue,green as values). • Actions can update a numeric state variable (whereas they just assign a non-numeric one) • Resource qty after action := Some-function-of(Resource qty before action, action parameters) • Updates can be linear OR non-linear • When combined with durative actions, updates can be discrete (i.e, happen all at once at the end of the action) OR continuous (or happen at some given rate during the action) • Planning issues: How to efficiently reason with continuous quantities during planning
PDDL 2.1 Standard:Summary • Durations • Static and dynamic durations allowed • Also allows duration inequalities • Preconditions • Can be “at start” or “over all” (throughout the duration) • Doesn’t model preconditions being needed for arbitrary durations in the middle • Effects • Can be “at start” or “at end” • This makes effects “discrete” • Numeric quantities • Can be present in the preconditions or effects • Presence in the effects can be “discrete” (“at start”/”at end”) or continuous • Continuous change specified by giving a “rate” at which the quantity changes • Non-linear rate harder
(:durative-action burn_match :parameters () :duration (= ?duration 15) :condition: (and (at start have_match) (at start have_strikepad)) :effect (and (at start have_light) (at end (not have_light)) ) ) PDDL 2.1 (Level 2)Pure Durative Actions (:durative-action cross_cellar :parameters () :duration (= ?duration 10) :condition (and (at start have_light) (over all have_light) (at start at_steps)) :effect (and (at start (not at_steps)) (at start crossing)(at end at_fuse_box)) have_match, have strikepad BURN MATCH(dur: 15) ~have_light have_light have_light, at_steps CROSS_CELLAR (dur: 10) ~at_steps, crossing at_fuse_box
PDDL 2.1 Level 3:Durative actions and numeric quantities(but discrete effects) The entire energy to be consumed is “encumbered” at the very beginning (even though it gets consumed Slowly over the full duration.
PDDL 2.1 Level 4:Durative actions and numeric quantities(with continuous effects: )
Issues in modeling continuous change by discrete vs. continuous effects • Consider the action of boiling a pan of water • The quantity “temperature of water” changes continuously over the duration of the action • We can ignore continuous effects by specifying that temperature is 1000 C at the end • Easy to handle; can only access the temperature at the end of the action; Reduces concurrency (what if we also put a blow torch to the pan to “hasten” the process?) • Or we can specify that the temperature of the water raises at a linear rate until it becomes 100 • Harder to handle; but allows more concurrency (the total rate of increase is summation of all the individual rates of increase)
Compiling durative actions into instantaneous ones • A durative action A that has only at-start, at-end and over-all conditions can be modeled in terms of two coupled instantaneous actions As and Ae • As gets all the at-start conditions and effects • Ae gets all the at-end conditions and effects • An “invariant” (think of it as an Interval Preservation Constraint) from As to Ae for all the “overall” preconditions +es +ee A ps po pe po +ee +es As Ae ps pe
A3 A1 A2 Q At(truck,B) Drive(cityA,cityB) Plan representation An executable plan must provide -- the actions that need to be executed -- the start times for each of the actions Or a set of simple temporal constraints on the set of actions (S.T.C. are generalization of partial orders) E.g. A1—[4,5]A2 (means 4 <= ST(A2) – ST(A1) <= 5 ) Plan views: Pert and Gantt charts GANTT Chart is what is shown on the right PERT shows the Causal links
Problem Representation • Achievement Goals are specified as a list <pi,ti> where pi needs to hold by time ti • ti is the deadline by which G must hold. It can be metric time (e.g. make clear(b) true by 2pm.) • If ti is omitted we will assume that G is a non-deadline goal (must be true by the time the plan is done. • “Persist Goals” are specified as a condition and an interval over which it must hold • A persist goal may be supported by different actions for the different parts of the duration ( “goal reduction” a la ZENO) • E.g. striking multiple matches to have light over a duration
A3 A1 A2 Q At(truck,B) Drive(cityA,cityB) Plan Quality Measures • Makespan: Clock time for the execution of the plan (more concurrency lower makespan) • Slack: The difference between the deadline for a goal and the time by which the plan achieves it • Tardiness is negative slack • Optimize max/min/average slack/tardiness measures • Cost: Sum of costs of all the actions • Can be split into multiple dimensions, one corresponding to each resource
Concurrency • Two actions are concurrent if their execution durations overlap in time • A plan is concurrent if it has concurrently executing actions • If make-span of a plan is less than the sum of the durations of the actions in the plan, then the plan has concurrency • A problem requires concurrency if every solution plan for the problem is concurrent • Note that a problem has sequential solutions but for optimality reasons it may have to go for concurrent solutions • A domain requires concurrency if any of its problems requires concurrency • One distinguishing feature of temporal planning domains is that they may have problems that require concurrency. • Interesting Factoid: Several of the planners that won the temporal planning competition could not actually solve problems requiring concurrency • Another interesting factoid: Most of the bench-mark domains actually didn’t have problems that required concurrency [Cushing et. al. IJCAI-07; ICAPS-07]
Looking at STRIPS Actions from PDDL2.1 Vantage Point • How best to view non-durative actions? • Instantaneous • Makes it hard to provide physical semantics (no change is instantaneous) • epsilon duration with only Overall preconditions and At-end post-conditions • We can show that domains with this type of actions can never have problems that require concurrency
TGP-style durative actions • A PDDL-2.1 action is a TGP-style durative action if • All preconditions are “Overall” preconditions • All effects are “at-end” effects • It can be shown that domains in which all actions areTGP-style will not require concurrency • Concurrency may still be needed for make-span optimization
Temporal Gap • A PDDL-2.1 style action is said to have temporal gap if there is no single time-point in the action where all the preconditions and effects of the actions must hold • Epsilon duration STRIPS actions have no temporal gap • TGP-style actions have no temporal gap • All the preconditions and effects must hold together at the end point of the action • If none of the actions in a domain have temporal gap, then that domain cannot have problems with required concurrency • “Duration” is like a cost measure
Add… • The issue of time—dense vs. integer • Rintanen’s complexity issue—R.C. with the same action.. • Non-RC plans can be compiled 1-1 • A huge modeling jump
Some Brand Names • Planners that can handle similar types of temporal and resource constraints: • TLPlan, HSTS, IxTexT, Zeno, SAPA, LPG • TlPlan, SAPA are progression-based planners • HSTS,IxTET,Zeno are partial-order-based planners • TlPlan,HSTS are domain-customized planners; the rest are domain independent • Planners that can handle a subset of constraints: • Only temporal: TGP, TPG, LPGP • Only resources: LPSAT, GRT-R, Kautz-Walser, Metric-FF • Subset of temporal and resource constraints: TP4, Resource-IPP • LPGP and LPSAT are “loosely-coupled” systems. LPSAT connects SAT and LP solvers; LPGP connects Graphplan and LPsolver • Issues of how “tight” is the loose-connection. • TGP,TPG,LPGP are Graphplan-based • LPSAT is based on SAT encodings being sent to LP solvers • Kautz-Walser is based solely on LP encodings
State of the Art (as of IPC2002)(revised for IPC 2004) • At IPC 2002; PDDL 2.1 standard had three levels • Level 1: STRIPS/ADL • Level 2: +Durative Actions • FF, LPG, SAPA, SGPlan (extends LPG) • Level 3: +Numeric quantities discrete change • Sapa, LPG, SGPlan (extends LPG) • Level 4: +Continuous change • None at IPC • Some planners can handle it “in theory” but none are scalable
Approaches for MTP • In theory, pretty much every one of the approaches we saw for classical planning can be (and have been) extended to MTP (with varying degrees of scalability) • There are some interesting tradeoffs • PO planners are easiest to extend to support the concurrency needed for durative actions • Have harder time handling resources (because resource consumption depends on exactly what actions occurred before this time point) • Progression planners easiest to extend to support resource consuming actions • But harder time handling concurrency (need to consider “advancing clock” as a separate option in addition to applying one of the actions)
Our Road Map • Will focus on conjunctive planning approaches—with special attention to Sapa • action models • Using PDDL2.1 standard • how to model the search • Progression; Regression; PO planning • how to extract good heuristics Done
(in-city ?airplane ?city1) (in-city ?airplane ?city2) consume (fuel ?airplane) Flying (in-city ?airplane ?city1) (fuel ?airplane) > 0 • Consuming the same resource • One action’s effect conflicting with other’s precondition or effect Action Conflicts: Action Representation • Durative with EA = SA + DA • Instantaneous effects e at time • te = SA + d, 0 d DA • Preconditions need to be true at the starting point, and protected during a period of time d, 0 d DA • Action can consume or produce continuous amount of some resource
Digression: Concurrent vs. Parallel plans • The main difference with temporal planning is that we need to produce concurrent plans • In the context of classical planning, concurrent planners are akin to parallel plans (aka Graphplan) • This analogy is not complete of course. For every solvable problem in classical planning, there is guaranteed to be a sequential plan. This guarantee does not hold for temporal planning (which means we have to search in the space of concurrent plans) • Progression planners that we have seen until now produce sequential plans (FF does not produce parallel plans!) • FF is still complete because in classical planning, there is always a sequential plan for every problem • So, we can start by asking what we need to do to make progression produce parallel plans.
Digression: How to produce parallel plans with progression? • The naïve idea is to project over subsets of non-interfering actions (rather than single actions). • Problem: Exponential branching factor • A better idea: Consider “fattening” as well as “lengthening” the current partial plan as two options. • We start by representing the state of a partial plan prefix as [S, {A1…Ak}] where S is the current state, and {A1..Ak} are the mutually non-interfering actions that we have already committed to applying at S. • Notice that this is just a generalization of the normal progression state, in which the action set {A1..Ak} will be a singleton • Given a state [S,{A1..Ak}] to expand, we have (backtrackable) choices: • Fatten: Consider applying another action B in state S [One branch for each possible action B] • For this to be feasible, B should be applicable in Si and B should not be interfering with A1..Ak. The resulting state will be {S; {A1…Ak}} • Lengthen: Consider applying an action C in the state S’ which is obtained by applying actions {A1…Ak} in S [One branch for each applicable action] • For this to be feasible, C should be applicable in S’. The resulting state is {S’, {C}} • Notice that • Fattening is only done at the current state (once lengthening is done, the current state changes. So any new fattening will be done at the new state. • Normal progression always selects “Lengthen”. The addition needed to support parallel plans is the “Fatten” branch.
Digression: Generating concurrent plans is similar to generating parallel plans… • To generate concurrent plans using progression, we start with the idea of generating parallel plans with progression • For parallel plans, the “state of the partial plan” is represented by [S, {A1..Ak}] • For temporal concurrent plans, we need to generalize this to consider the fact that • Each action may have different duration • Actions may have effects that are realized at different time points in the future • This means that some actions that we have committed to applying at previous states may wind up posting their effects now. • The solution is to start thinking in terms of “current time stamp”, and information about the set of durative actions that we have committed to apply whose effects have not yet been realized. • We can either add additional non-interfering actions at the current time-stamp • OR advance the timestamp (to the nearest future time where new effects of already committed actions can be realized). NOT!
Set <pi,ti> of predicates pi and the time of their last achievement ti < t. Set of protected persistent conditions (could be binary or resource conds). Time stamp of S. Set of functions represent resource values. Event queue (contains resource as well As binary fluent events). State-Space Search:Search is through time-stamped states Search states should have information about -- what conditions hold at the current time slice (P,M below) -- what actions have we already committed to put into the plan (,Q below) S=(P,M,,Q,t) In the initial state, P,M, non-empty Q non-empty if we have exogenous events
Light-match Let current state S be P:{have_light@0; at_steps@0}; Q:{~have_light@15} t: 0 (presumably after doing the light-candle action) Applying cross_cellar to this state gives S’= P:{have_light@0; crossing@0}; :{have_light,<0,10>} Q:{at_fuse-box@10;~have_light@15} t: 0 Time-stamp Light-match Cross-cellar 15 10
“Advancing” the clock as a device for concurrency control To support concurrency, we need to consider advancing the clock How far to advance the clock? One shortcut is to advance the clock to the time of the next earliest event event in the event queue; since this is the least advance needed to make changes to P and M of S. At this point, all the events happening at that time point are transferred from Q to P and M (to signify that they have happened) This This strategy will find “a” plan for every problem—but will have the effect of enforcing concurrency by putting the concurrent actions to “align on the left end” In the candle/cellar example, we will find plans where the crossing cellar action starts right when the light-match action starts If we need slack in the start times, we will have to post-process the plan If we want plans with arbitrary slacks on start-times to appears in the search space, we will have to consider advancing the clock by arbitrary amounts (even if it changes nothing in the state other than the clock time itself). In the cellar plan above, the clock, If advanced, will be advanced to 15, Where an event (~have-light will occur) This means cross-cellar can either be done At 0 or 15 (and the latter makes no sense) ~have-light Light-match Cross-cellar Cross-cellar 15 10
Search Algorithm (cont.) • Goal Satisfaction: S=(P,M,,Q,t) G if <pi,ti> G either: • <pi,tj> P, tj < ti and no event in Q deletes pi. • e Q that adds pi at time te < ti. • Action Application: Action A is applicable in S if: • All instantaneous preconditions of A are satisfied by P and M. • A’s effects do not interfere with and Q. • No event in Q interferes with persistent preconditions of A. • A does not lead to concurrent resource change • When A is applied to S: • P is updated according to A’s instantaneous effects. • Persistent preconditions of A are put in • Delayed effects of A are put in Q. S=(P,M,,Q,t) [TLplan; Sapa; 2001]