230 likes | 429 Views
Execution Monitoring & Replanning. Replanning(?). Why would we need to replan? You recognize, during execution, that it is not going according to the plan Execution Failure Quality reduction How can this happen? Simple answer: Modeling failure (or intentional model simplification )
E N D
Replanning(?) • Why would we need to replan? • You recognize, during execution, that it is not going according to the plan • Execution Failure • Quality reduction • How can this happen? • Simple answer: Modeling failure (or intentional model simplification) • The world is not static (dynamic) • The actions are not instantaneous (durative) • The world is not deterministic (stochastic) • The wold is not fully observable (partially observable) • The specific action model you assumed is faulty • There are additional preconditions for actions • There are additional effects for actions • The specific cost/reward model you assume dis faulty • Actions are more (or less) costly than you assumed • Goals have higher (or lower) reward than you assumed • The problem specification is not yet complete(!) • New goals are being added • Some goals are being retracted
Replanning (contd.) • What should you do? • First, you need to recognize that something is astray • Execution Monitoring • Could be non-trivial if what is going astray is plan-quality • Then, you need to fix the problem at least for now • Simplest—restart execution (somewhere) • Complex: Modify the plan • Figure out where you are (both initial and goal states and cost/reward metrics) • Init statesense • Goal state? re-select objectives • Cost/reward Modify costs/rewards to allow for new goals, impossible actions and/or commitment/reservation penalties • Plan • This process can be different from normal planning (commitments caused by publication of the old plan) • Finally, if this keeps happening, you need to fix the model There is nothing wrong in going with the wrong model if it causes only very occasional failures (all models are wrong!)
Simple Replanning Scenario • Replanning necessitated only because of correctness considerations (no regard to optimality) • Problem specification is complete (no new goals are being added)
Things more complicated if the world is partially observable Need to insert sensing actions to sense fluents that can only be indirectly sensed
Cutset(s) = {P s.t. <s’,P,s’’> is a causal link and s’<s<s’’ } For sequential plans, this is also simply the regression of goal state up to this action cutset “Triangle Tables” Can be generalized to Partially ordered plans
This involves disjunction of conjuctive goalsets! The only reason to get back to the old plan is to reduce planning cost
(Simple) Replanning as Disjunctive Goal Sets • Suppose you are executing a plan P which goes through “regression states” (or cut-sets) G1..Gn • You find yourself in a state S’ • If any of of G1..Gn hold in S’ then restart execution from the action after that state • If not, you need to go from S’ to any one of G1..Gn • Use relaxed plan heuristic to find out which of G1..Gn are closes to S’. Suppose it is Gi • Solve the problem [S’,Gi]
Replanning as the universal antidote to domain-modeling laziness • As long as the world is forgiving, you can always go with a possibly faulty domain model during planning, and replan as needed • You learn to improve the domain model only when the failure are getting too frequent.. • (The alternative of going with the correct domain model up-front can be computationally intractable!)
Stochastic Planning with Replanning • If the domain is observable and lenient to failures, and we are willing to do replanning, then we can always handle non-deterministic as well as stochastic actions with classical planning! • Solve the “deterministic” relaxation of the problem • Start executing it, while monitoring the world state • When an unexpected state is encountered, replan • A planner that did this in the First Intl. Planning Competition—Probabilistic Track, called FF-Replan, won the competition. • (Much to the chagrin of many planners which took the stochastic dynamics into account while doing planning..)
20 years of research into decision theoretic planning, ..and FF-Replan is the result? 30 years of research into programming languages, ..and C++ is the result?
MURI Rescue Scenario • Human and Robot collaborating on a rescue scenario • The planner helps the Robot in prioritizing its goals and selecting its actions • Planning part has characteristics of • Online planning (new goals may arrive as the current plan is being executed; relative rewards for existing goals may change because of affect) • Replanning(current plan may hit execution snags) • Opportunistic planning (previously inactive goals may become active because of the knowledge gained during execution) • Commitment-sensitivity (The robot needs to be sensitive to the plans that it said it will be following)
Can PSP model help? • We argued that PSP model helps in MURI • It does help in capturing the replanning, changing utilities and commitment sensitivity • Can we extend it to also handle opportunistic goals? • Simple answer: Yes—we just re-select objectives (goals) during each replanning epoch
Opportunistic Goals in PSP Would these be related to Conditional reward models? --e.g. how to model the goal that “if you see someone injured, help them” (and not let the robot injure someone just so it can collect the reward) • Opportunistic goals can be handled in the PSP model without much change • Goals like “spot aliens” may be seen as always being present in the list of goals that the planner (robot) has • Initially, these goals may not be picked because despite having high reward, these goals also have high cost (i.e., no cheap plan to satisfy them; even as “estimated” by the relaxed plan analysis) • As execution progresses however, the robot may reach states from where these goals become “reachable” (even as estimated by the PSP goal selection heuristic) • Note that this happens only because the world is not static
Replanning—Respecting Commitments • In real-world, where you make commitments based on your plan, you cannot just throw away the plan at the first sign of failure • One heuristic is to reuse as much of the old plan as possible while doing replanning. • A more systematic approach is to • Capture the commitments made by the agent based on the current plan • Model these commitments in terms of penalties for certain (new) goals • Just as goals can have rewards, they can also have penalties that you accrue for not achieving them. Makes PSP objective selection a bit more interesting ;-) The worst team member is not the one who doesn’t do anything, but rather the one who promises but doesn’t deliver
Interaction between Opportunistic goals and Commitments • Even if a high-reward sleeper goal becomes available because a plan for it is feasible, it may still not get selected because of the commitments already made by the partial execution of the current plan • The interesting point is that the objective selection phase used by the PSP should be able to handle it automatically (as long as we did post commitment induced goals onto the stack).
Monitoring for optimality • Given the online-nature of planning, we need to assume an “epoch” based model of planning, where every so often you “replan” • So as not to spend the whole life planning, you need to be good at monitoring • Not just potential execution failures • But also potential optimality reductions • (The plan being followed is no longer likely to be optimal.) • Optimality monitoring has been considered by Koenig (in Life Long Planning work) and more recently by Fritz&McIlraith. Their approaches are similar and have several limitations • The “annotations” used are often of the size of the search space. E.g., the idea in Fritz seems to be mostly to keep track of all possible action sequences (including those that weren’t applicable originally) and see if they become applicable and reduce the f-value. Secondly, Fritz doesn’t consider optimality damage caused by, for example, sleeping goals becoming active • A better idea may be to reduce the scope of monitoring and check necessary conditions and sufficient conditions for optimality separately (e.g. “secondary search”) • Monitoring, in general, may have to do some relaxed plan based objective (re)selection, to see whether or not the current plans focus on the selected set of goal is still optimal
Allegiance to the “old plan” in replanning • You never have any allegiance to the old plan in replanning from the execution cost point of view • You may try to salvage the old plan to reduce planning cost • You can have allegiance to the commitment you made if you published your old plan (it is to the commitment, not the specific plan) • These can be modeled in terms of goal penalties • Of course, one way of ensuring commitments are not broken is to stick to the exact old plan. But this could be sub-optimal • E.g. I was going to go by a red mercedes to Tucson and when I published it, my friend in Casa Grande said he will meet me to say hello. Now the commitment is only to meeting the friend in Casa Grande, not driving Red Mercedes. I make a big-deal about this only because in the literature, replanning is made synonymous with sticking to as much of the old plan as possible
Effect of Planning Strategy on Allegiance to the Plan • Once we agree that allegiance to the old plan can be useful to reduce planning cost, a related question is what planning strategies are best equipped to modify an existing plan to achieve the new objectives. • Here, there is some argument in favor of partial order planners (in as much as they allow insertion of actions into the existing plan) • But I am not fully convinced…
Epilogue • As we go forward to look at how to do planning in the presence of more and more expressive domain models, remember that you can (and may) intentionally simplify the domain model, plan with it and then replan.. • So you already know one way of handling dynamic, multi-agent, stochastic, non-instantaneous etc. worlds • And it may not even be all that sub-optimal considering the success of FF-Replan
Related Work • Ours • All the PSP work • The online planning work (benton/Minh) • The commitment sensitive replanning work (Will) • Outsiders • Life-long A* work by Sven Koenig • Optimality monitoring work by Fritz & McIlraith • Online anticipatory algorithms?