Hierarchical Mechanisms for Robot Programming

Hierarchical Mechanisms for Robot Programming Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst May 30, 2008 NEMS ‘08

representation programming Action Potential functions Value functions State representation reinforcement learning user defined intrinsic extrinsic Outline Hierarchical mechanisms for robot programming

Hierarchical Actions programs greedy traversal avoids local minimum Φ ϕ closed loop primitive actions value functions potential fields force velocity references feedback signals G G G Σ Σ Σ H H H

Primitive Action Programming Interface Sensory Error () • Visual (uref) • Tactile (fref) • Configuration variables (θref) • Operational Space(xref) Potential Functions () • Spring potential fields (ϕh) • Collision-free motion fields (ϕc) • Kinematic conditioning fields (ϕcond) Motor Variables () • Subsets of : • Configuration Variables • Operational Space Variables primitive actions: a = a1 a2 Nullspace Projection

no reference () - convergence unknown 1 X descending gradient 0 State Representation • Discrete abstraction of action dynamics. • 4-level logic in control predicate pi

Learn value functions using reinforcement learning - 1 X 0 Hierarchical Programming • A program is defined as a MDP over a vector of controller predicates: S= p1 … pN • Absorbing states in the value function capture “convergence” of programs.

- - 1 1 X X 0 0 Intrinsic Reward • Goal: build deep control knowledge • Reward controllable interaction with the world • controllers with direct feedback from the external world. Catalog Track Touch Grasp Insert Stack convergence event

Experimental Demonstration • Motor units • Two 7-DOF Barrett WAMs • Two 4-DOF Barrett Hands • 2-DOF pan/tilt stereo head • Sensory feedback • Visual • Hue • Saturation • Intensity • Texture • Tactile • 6-axis finger-tip F/T sensors • Proprioceptive Dexter

Sst= psaccade ptrack  atrack atrack X 0 X X X 1 atrack 1 X X - asaccade asaccade 0 X STAGE 1: SaccadeTrack- 25 Learning Episodes Track-saturation rewarding action

Srg= pst preach pgrab  STAGE 2: ReachGrab - 25 Learning Episodes Touch Track-saturation rewarding action

STAGE 2: ReachGrab - 25 Learning Episodes Touch Track-saturation

Svi= prg pcond ptrack(blue)  Track-blue STAGE 3: VisualInspect - 25 Learning Episodes Touch Track-saturation rewarding action

Track-blue STAGE 3: VisualInspect - 25 Learning Episodes Touch Track-saturation

Sgrasp= prg pmoment pforce  Grasp Track-blue STAGE 4: Grasp – User Defined Reward Touch Track-saturation X - - X 1 0 X X X X 1 1 X 0 0 X 0 1 ReachGrab amoment aforce rewarding action - 1 X X X 1 0

X 0 0 X 0 - - X 1 1 1 X X X 1 0 1 X X X X atransport amoment 0 Grasp X - - STAGE 5: PickAndPlace – User Defined Reward Spnp= pg ptransport pmoment  rewarding action

Conclusions • Mechanisms for creating hierarchical programs. • recursive formulation of potential functions and value functions. • control theoretic representation for action, state, and intrinsicreward. • Experimental demonstration of programming manipulation skills using staged learning episodes. • Intrinsic reward pushes out new behavior and models the affordances of objects.

Thank You

Hierarchical Mechanisms for Robot Programming