170 likes | 359 Views
What’s Planning?. Derek Long University of Strathclyde, Glasgow. What’s Planning?. Johann: “Diagnosis = Planning (almost)” Rearranging the inequality: Planning is more than diagnosis Hadas: planning is finding counter-examples to disprove LTL formulae
E N D
What’s Planning? Derek Long University of Strathclyde, Glasgow
What’s Planning? • Johann: “Diagnosis = Planning (almost)” • Rearranging the inequality: • Planning is more than diagnosis • Hadas: planning is finding counter-examples to disprove LTL formulae • Actually, this is a way to view one kind of planning • Brian: planning is venerable and geriatric (or was it ‘generative’?) • Most interesting planning is much less than 10 years old
What do we need? • Start with some assumptions… • Assume the world can be described as a set of states • Assume that things cause transitions between these states – these things include (controllable) actions, but could also include events and processes • Assume that an initial state is (partly) known • Assume that the causal relationship between transitions and states is sufficiently predictable that there is a point in considering how to use the controllable actions to direct the transitions of the world
Hybrid Timed Automaton • Our world model can be seen as a hybrid timed automaton • Finite set of discrete states • Associated vector of real-valued variables that can be changed by discrete transitions or by passage of time (under influence of processes) • Transitions can be triggered (events) or controlled (actions) and can be non-deterministic (with or without probability distributions) • States might be fully observable or only partially observable
What is a plan? • A plan is something that tells an executive what to do in order to bring about desirable states of the world • Perform an action or wait • Desirable states are often states we want to get to (and stop) • Classical planning goals • Could be properties of states in a path (LTL formula) • Could also be determined by some reward function that accumulates reward for visiting states (and perhaps penalises bad states, or actions) • In general, we assume that we can map the trajectories determined by a plan to a value such that the higher the value, the better the plan
What can an executive do? • Simple executives dispatch actions based only on time • Wall clock time (actions must be timestamped) • Sequenced (actions need only be ordered) • More complex executives could dispatch actions based on sensed states and an internal state • So, plans must map from the state of the executive and the sensed state of the world to actions (including wait)
States with Structure • Typically, discrete state sets are large, so are represented using assignments to a vector of finite-domain variables • For example, consider a world in which vehicles perform a search over a square grid • A vehicle occupies a square of the grid and faces north, south, east or west • A vehicle can move forward, left or right, completing its move facing the direction it moved in • A vehicle can search a square it occupies • A state characterises the positions of the vehicles, their facings and the status of the squares (searched or unsearched) Searched Move N W E Search Unsearched S Say 12x12 grid, 4 vehicles: 2144 x 44 x 1444 = 2.5 x 10 54 states
Planning: Classical and more • Classical planning: • only finite-domain variables and only deterministic transitions • Initial state is fully observable and goal specifies a set of alternative destination states • Plan quality is measured by number of actions • A plan can be specified as a sequence of actions • Determinism means we can be sure that only the states on the path from the initial state to the selected goal state are ever visited • Typically hard to find a feasible solution, so optimising is a secondary objective • Current best solutions based on heuristic guided search, using relaxations as the basis for heuristics
Progress in Planning • A great deal of effort has been spent on finding good relaxations • Admissible relaxations guarantee optimal plans: current best based on combination of techniques, including identification of landmarks and automatically constructed pattern databases • The classical planning problem is PSPACE hard, but many benchmark domains are actually only NP-hard • A separate approach to planning has been compilation into other solving technologies: • SAT, CSP, model-checking • None of these approaches is currently competitive with the best dedicated planning systems
Duration Start Preconditions End Preconditions Invariant Condition Start Effects End Effects Temporal Planning • Actions embedded in time, with a coupled pair of transitions marking the starts and ends of the durative actions • Plan quality usually measured by total duration of plan • Current best solutions extend classical planners by coupling the heuristic search to temporal constraint managers (STNs) and using relaxed temporal bounds on earliest application times of actions • Temporal uncertainty can be approached using controllable and uncontrollable temporal transitions (STNUs) • Planning with time is PSPACE-hard if number of copies of the same action executing concurrently is bounded, otherwise EXPTIME
Trajectory Constraints • Planning to satisfy LTL formulae has been explored using approaches based on compiling the formulae to automata, linking these into the existing actions and then applying standard planners • Surprisingly effective for interesting constraints
Using real variables • Resources can be modelled using real-valued variables • Actions can change these values (discretely) and processes can change them as a function of passage of time • Plan quality can be measured by combinations of duration and values of metric variables (eg fuel costs, monetary costs, benefits from rewards etc) • Best current approaches also use heuristic guided search, using bounding interval or LP relaxations of the MILP constraints generated by discrete action effects and LP relaxations of linear effects • Another alternative for continuous processes is discretise-and-validate (can handle non-linear effects) • Adding numbers makes planning undecidable, in general…
For instance • Temporal version of the search problem, vehicles moving at different speeds, with fuel limiting the number of moves each vehicle can perform • Solve 10x10 grid problem, 4 vehicles, in under 10 seconds for balanced problem (makespan 80 versus nominal optimal of 72) • Up to 1 minute for unbalanced vehicles (a 255 step plan), within 10% of optimal
Planning under Uncertainty • Uncertainty can arise in many forms… • Partially observable initial state (and subsequent states) • Non-deterministic action effects • Uncertainty about duration and resource consumption of actions • Plans can no longer specify only what to do in states on the planned trajectory, since uncertainty means we might visit states we had not intended • Now require policies: • Mapping from current “state” to action (and possibly a new internal state) • Here a state might be a world state or a sensed partial world state and an internal state (often a belief state)
Policy construction • Finding policies is hard! • Usually assume we have a reward function and a cost for actions • Reward often assumed to be additive and discounted • Plan quality is measured by expected total net reward • Bellman equations determine the optimal policy for such a problem… • In principle, these can be solved by convergent iterated approximation schemes (policy iteration or value iteration) • In practice, this is not practical for interesting problems
Realistic Policy Building? • Partial policies (only offer actions for states that are likely to be visited) • Abstraction (grouping of similar states) and other “clumping” techniques • Hindsight optimisation: • Solve Monte Carlo samples and then improve the policy based on better solutions to samples • Partial observability introduces a potential exponential lift in complexity due to handling belief states (or state occupancy probability distributions)
What to do with a partial policy • If we find ourselves in a state with no policy mapping then we can extend the policy: • On-line planning/replanning (policy extension or repair) • Default actions (to attempt to return to nominal trajectory)