1 / 20

Replanning in Domains with Partial Information and Sensing Actions

Replanning in Domains with Partial Information and Sensing Actions. Guy Shani Ronen Brafman Ben-Gurion University. Problem. Background. SDR. Results. Online Planning under Uncertainty with Partial Observability and Sensing. Deterministic actions Concrete goal condition

winter
Download Presentation

Replanning in Domains with Partial Information and Sensing Actions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replanning in Domains with Partial Information and Sensing Actions Guy Shani Ronen Brafman Ben-Gurion University Problem Background SDR Results

  2. Online Planning under Uncertainty with Partial Observabilityand Sensing • Deterministic actions • Concrete goal condition • Uncertainty about the initial state • Non-stochastic model – states are either possible or impossible • Sensing actions provide information about the world • We can generate a conditional plan • Online planning • We do not plan for all contingencies ahead of time – just until the next observation Problem Background SDR Results

  3. Examples • Toy problems from CLG [Albore et at.] • Doors • Gate location unknown • Wumpus • Monster location unknown • Must correlate observations from multiple locations • Localize • Agent location unknown • Must reason from history Start state: (oneof (wumpus-at 2,3) (wumpus-at 3,2)) (oneof (wumpus-at 3,4) (wumpus-at 4,3)) Start state: (oneof (at 1,1) … (at 5,1) (at 1,2) (at 5,2) … (at 5,1) … (at 5,5)) Start state: (oneof (door-at 2,1) … (door-at 2,5)) (oneof (door-at 4,1) … (door-at 4,5)) Problem Background SDR Results

  4. Why work on this problem? • Uncertainty, partial observability • no need to motivate • Study the challenge of planning to sense/learn • Many POMDP methods cope poorly with information gathering sub-plans that do not provide rewards • We study this in a slightly simpler setting obtained by: • Simpler form of uncertainty : non-stochastic, deterministic actions • Structured actions and state (a-la STRIPS) • Extend existing techniques that focus on contingent planning with full observability Problem Background SDR Results

  5. Our contributions • Extending replanning techniques to handle this case • A lazy technique for (not) maintaining the belief state Problem Background SDR Results

  6. Replanning (basic idea) Generate simpler classical problem e.g. Reduce initial state uncertainty by choosing one state • Pros: very simple, fast, and often effective • Cons: a greedy approach with the regular drawbacks • Simplistic classical model can lead to poor choices • Can get caught in dead-ends • Smart sampling may reduce these problems Reduced uncertainty Classical problem Plan Plan for the reduced problem Execute plan until things break e.g. Observation doesn’t agree with the selected state

  7. Replanning with PO and Sensing – Take 1 • Determinize the problem by determinizing the current state • Plan for this initial state only • Execute until observations conflict with deterministic model • Replan! Problem: Planner will make no effort to sense. It plans as if it knows everything.  Need a more sophisticated model that captures the agent’s belief state Problem Background SDR Results

  8. Solution: Use Palacious and Geffner’s Translation-based Approach • Explicitly represent the agent knowledge • Knowledge predicates replace regular predicates • Kp = Know that p is true • Must ground knowledge on some initial features • A short tutorial with zero details. Problem Background SDR Results

  9. Translation to Classical Planning • Maintain predicate values given an initial state • i.e. we know that p is true given that si was the initial state and false if sj was the initial state. • Kp means that we know that p is true in all valid states • Revise actions: • An effect is transformed into • Precondition ptransformed to Kp, i.e. an action can be applied only if the preconditions hold in all valid states s0 • K(wumpus-at p-2,3)|s0 • K(not (wumpus-at p-3,2))|s0 • K(stench-at p-2,4)|s0 • K(stench-at p-2,2) Problem Background SDR Results

  10. Translation to Classical Planning • Sensing actions reveal an unknown predicate p, and hence have effect or • Actions to eliminate states from the belief • If then we know that s was not the initial state  effect • Warning! Many details are missing… Problem Background SDR Results

  11. Replanning with PO and Sensing – Take 2 • Use knowledge domain translation • Feed translation into classical planner • Execute plan until things break • E.g. observation is inconsistent with expectations • Replan! • Still missing… • What happens when sensing actions are executed in the knowledge domain? • Translation size is often huge!

  12. Replanning with PO and Sensing – Take 2 • Problem 1: Sensing actions translate into non-deterministic actions • Solution: Determinize sensing by choosing an initial state s0. All observations will be consistent with this state • Sensing actions have conditional (deterministic) effects: • Planner must KNOW preconditions of actions and goals • It must use explicit sensing actions Problem Background SDR Results

  13. Replanning with PO and Sensing – Take 2 • Problem: Translation is often huge • Given N initial states. Predicate copy for each initial state (N copies). • Each condition in every actions is copied 2N times. • Actions to eliminate every initial state on every predicate. • Solution: sample a small number of possible initial states • To summarize: • Sample subset S out of the possible current states • To reduce the translation size • Sample s0 from S • Base observations on s0 • Generate knowledge domain translation, given S and s0 • Solve using a classical planner Problem Background SDR Results

  14. Still missing … Belief Maintenance • Must recognize if the goal was reached • Must recognize if the preconditions of the next action are guaranteed to be true • Requires maintaining information about the current belief state (set of valid states) • This issue is orthogonal to how we generate the plan Problem Background SDR Results

  15. Belief Maintenance through Regression • A (very) lazy approach • Maintain b0 as a formula • Maintain history a1,o1,…,at,ot • Cons: must regenerate formula on every query • Pros: generated formula is focused only on the current query and remains small Check whether ct holds at bt • Regress through • at,otresulting in • Regress through • at-1,ot-1resulting in … • Regress through a1,o1resulting in If there is no satisfying assignment then ct holds at bt • Solve SAT problem

  16. Sample, Determinize, Replan - SDR Select S and s0 Translate to classical planning Plan Run classical planner (FF) Regress goal (solved using MiniSat) Regress action precondition Execute action Goal achieved! Terminate Execute Check observation consistency Problem Background SDR Results

  17. CLG vs. SDR • CLG translation generates non-deterministic effects for observation actions • In offline mode all possibilities are checked • In online mode the environment is queried (as we do) • Uses a specialized semi-classical planner (FF variant) – SDR can use any black box planner (experiments use FF) • CLG uses tags • In many (most) cases more efficient than complete states. • Complete translation still blows up rapidly. Problem Background SDR Results

  18. Results Problem Background SDR Results

  19. Summary • SDR – Contingent Replanner under partial observability • Sample a set of possible states from the current belief. • Create a classical planning translation. • Execute plan until sample proven invalid or goal was reached. • SDR shown to be faster and scale up to larger domains than CLG (state-of-the-art) Problem Background SDR Results

  20. Future Work • Sensing costs • Sensing can have a cost (e.g. sensor warmup) • Should have a tradeoff between sensing and acting • Remove sensed preconditions – agent should decide whether it wants to sense or not • Deadends – well known pitfall of replanning algorithms • Smarter sampling techniques • Scaling up – currently not much better than POMDPs! Thank you Problem Background SDR Results

More Related