Belief space planning assuming maximum likelihood observations

Belief space planning assuming maximum likelihood observations Robert Platt Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology June 30, 2010

Planning from a manipulation perspective (image from www.programmingvision.com, Rosen Diankov ) • The “system” being controlled includes both the robot and the objects being manipulated. • Motion plans are useless if environment is misperceived. • Perception can be improved by interacting with environment: move head, push objects, feel objects, etc…

The general problem: planning under uncertainty • Planning and control with: • Imperfect state information • Continuous states, actions, and observations most robotics problems N. Roy, et al.

Strategy: plan in belief space (underlying state space) (belief space) 1. Redefine problem: “Belief” state space 2. Convert underlying dynamics into belief space dynamics goal 3. Create plan start

Related work • Prentice, Roy, The Belief Roadmap: Efficient Planning in Belief Space by Factoring the Covariance, IJRR 2009 • Porta, Vlassis, Spaan, Poupart, Point-based value iteration for continuous POMDPs, JMLR 2006 • Miller, Harris, Chong, Coordinated guidance of autonomous UAVs via nominal belief-state optimization, ACC 2009 • Van den Berg, Abeel, Goldberg, LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information, RSS 2010

Simple example: Light-dark domain underlying state action Underlying system: Observations: observation noise observation “dark” “light” State dependent noise: start goal

Simple example: Light-dark domain underlying state action Underlying system: Observations: observation noise observation “dark” “light” State dependent noise: start Nominal information gathering plan goal

Belief system state Underlying system: action (deterministic process dynamics) (stochastic observation dynamics) observation • Belief system: • Approximate belief state as a Gaussian

Similarity to an underactuated mechanical system Acrobot Gaussian belief: State space: Planning objective: Underactuated dynamics: ???

Belief space dynamics goal start Generalized Kalman filter:

Belief space dynamics are stochastic goal unexpected observation start Generalized Kalman filter: BUT – we don’t know observations at planning time

Plan for the expected observation Generalized Kalman filter: Plan for the expected observation: Model observation stochasticity as Gaussian noise We will use feedback and replanning to handle departures from expected observation….

Belief space planning problem Find finite horizon path, , starting at that minimizes cost function: Minimize: • Minimize covariance at final state • Minimize state uncertainty along the directions. • Action cost • Find least effort path Subject to: Trajectory must reach this final state

Existing planning and control methods apply • Now we can apply: • Motion planning w/ differential constraints (RRT, …) • Policy optimization • LQR • LQR-Trees

Planning method: direct transcription to SQP 1. Parameterize trajectory by via points: • 2. Shift via points until a local minimum is reached: • Enforce dynamic constraints during shifting • 3. Accomplished by transcribing the control problem into a Sequential Quadratic Programming (SQP) problem. • Only guaranteed to find locally optimal solutions

Example: light-dark problem X Y • In this case, covariance is constrained to remain isotropic

Replanning New trajectory goal Original trajectory • Replan when deviation from trajectory exceeds a threshold:

Replanning: light-dark problem Planned trajectory Actual trajectory

Replanning: light-dark problem

Replanning: light-dark problem Originally planned path Path actually followed by system

Planning vs. Control in Belief Space • Given our specification, we can also apply control methods: • Control methods find a policy – don’t need to replan • A policy can stabilize a stochastic system A plan A control policy

Control in belief space: B-LQR • In general, finding an optimal policy for a nonlinear system is hard. • Linear quadratic regulation (LQR) is one way to find an approximate policy • LQR is optimal only for linear systems w/ Gaussian noise. Belief space LQR (B-LQR) for light-dark domain:

Combination of planning and control Algorithm: 1. repeat 2. 3. for 4. 5. if then break 6. if belief mean at goal 7. halt

Analysis of replanning with B-LQR stabilization • Theorem: • Eventually (after finite replanning steps) belief state mean reaches goal with low covariance. • Conditions: • Zero process noise. • Underlying system passively critically stable • Non-zero measurement noise. • SQP finds a path with length < T to the goal belief region from anywhere in the reachable belief space. • Cost function is of correct form (given earlier).

Laser-grasp domain

Laser-grasp: the plan

Laser-grasp: reality Initially planned path Actual path

Conclusions • Planning for partially observable problems is one of the keys to robustness. • Our work is one of the few methods for partially observable planning in continuous state/action/observation spaces. • We view the problem as an underactuated planning problem in belief space.

Belief space planning assuming maximum likelihood observations