460 likes | 568 Views
Robust Belief-based Execution of Manipulation Programs. Kaijen Hsiao Tomás Lozano-Pérez Leslie Pack Kaelbling MIT CSAIL. Achieving Goals under Uncertainty. Two kinds of uncertainty: current state: need to plan in information space results of future actions:
E N D
Robust Belief-based Execution ofManipulation Programs Kaijen Hsiao Tomás Lozano-Pérez Leslie Pack Kaelbling MIT CSAIL
Achieving Goals under Uncertainty • Two kinds of uncertainty: • current state: • need to plan in information space • results of future actions: • search branches on outcomes as well as actions • Choice of action must be dependent on current information state
Discrete POMDP Formulation • states • actions • observations • transition model • observation model • reward
POMDP Controller Controller • State estimation is discrete Bayesian filter • Policy maps belief states to actions belief SE sensing action Environment
Action selection in POMDPs • Off-line optimal policy generation • Intractable for large spaces • On-line search: finite-depth expansion of belief-space tree from current belief state to select single action • Tractable in broad subclass of problems
Challenges for action selection • Continuous state spaces • Requirement to select action for any belief state • Long horizon • Action branching factor • Outcome branching factor • Computationally complex observation and transition models
Grasping in uncluttered environments • Points of leverage: • Robot pose is approximately observable • Robot dynamics are nearly deterministic • Bounded uncertainty over unobserved object parameters • Room to maneuver
Online belief-space search • Continuous state space: discretize object state space
Discretize object configuration space workspace configuration space belief state
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state
Search forward from current belief • Low entropy belief states enable reliable grasp • Use entropy as static evaluation function at leaves • Actions can be useful for information gathering
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions
Use temporally extended actions • Primitive actions Entire trajectories • Reduce horizon Observations at end
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief
Parameterize actions with belief • Actions are entire world-relative trajectories • In current belief state, • execute with respect to most likely object configuration • terminate on contact or end of trajectory
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief • Computationally complex observation and transition models: precompute models
Precompute models • Execute WRT • with respect to estimated state e • in world state w • Expected observation,transition • Based on geometric simulation
Online belief-space search • Continuous state space: discretize object state space • Action for any belief: search forward from current belief state • Long horizon: use temporally extended actions • Large action branching factor: parameterize small set of action types by current belief • Computationally complex observation and transition models: precompute models • Large observation branching factor: canonicalize observations for each discrete state and action
Canonicalize observations • Any (e, w) pair with same relative transformation has same world-relative outcomes and observations • Only sample for one e with w varying within initial range of uncertainty • Cluster observations and represent each bin of object configurations by a single representative one • Only branch on canonical observations
Algorithm • Off-line: • plan WRTs for grasping and info gathering • compute models • On-line: • while current belief state doesn’t satisfy goal • compute expected info gain of each WRT • execute best WRT until termination • use observation to update current belief • return to initial pose • execute final grasp trajectory
Application to grasping with simulated robot arm • Initial conditions (ultimately from vision) • Object shape is roughly known (contacted vertices should be within ~1 cm of actual positions) • Object is on table and pose (x, y, rotation) is roughly known (center of mass std ~5 cm, 30 deg) • Achieve specific grasp of object
Observations • Fingertips: 6-axis force/torque sensors • position • normal • Additional contact sensors: • just contact • Swept non-colliding path rules out poses that would have generated contact
Grasping a Box Most likely robot-relative position Where it actually is
Trying again, with new belief Back up Try again
Final state and observation Observation probabilities Grasp
Updated belief state: Success! Goal: variance < 1 cm x, 15 cm y, 6 deg theta
Simulation Experiments • Methods tested: • Single open-loop execution of goal-achieving WRT with respect to the most likely state • Repeated execution of goal-achieving WRT with respect to the most likely state • Online selection of information-gathering and goal-achieving grasps (1-step lookahead)
Box experiments • Allowed variation in goal grasp: 1 cm, 1 cm, 5 deg • Initial uncertainty: 5 cm, 5 cm, 30 deg
Cup experiments • Goal 1 cm x, 1 cm y, rotation doesn’t matter (no info-grasps used) • Start uncertainty 30 deg theta (x,y varies) Increasing uncertainty
Grasping a Brita Pitcher Target grasp: Put one finger through the handle and grasp
Brita Pitcher results Increasing uncertainty
Other recent probabilistic approaches to manipulation • Off-line POMDP solution for grasping (Hsiao et al. 2007) • Bayesian state estimation using tactile sensors to locate object before grasping (Petrovskaya et al. 2006) • Finding a fixed trajectory that is most likely to succeed under uncertainty (Alterovitz et al. 2007, Burns and Brock 2007)
Timing For Brita Pitcher • (2.16 GHz processor, 3.24 GB RAM running Python, times in seconds)
Creating Information-gain Trajectories • Trajectory generation • Generate endpoints, use randomized planner (such as OpenRAVE) to find nominal collision-free path • Sweep through entire workspace • Choose a small set based on information gain from start uncertainty