170 likes | 206 Views
Hierarchical Mechanisms for Robot Programming. Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst May 30, 2008 NEMS ‘08. representation. programming. Action Potential functions Value functions. State representation.
E N D
Hierarchical Mechanisms for Robot Programming Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst May 30, 2008 NEMS ‘08
representation programming Action Potential functions Value functions State representation reinforcement learning user defined intrinsic extrinsic Outline Hierarchical mechanisms for robot programming
Hierarchical Actions programs greedy traversal avoids local minimum Φ ϕ closed loop primitive actions value functions potential fields force velocity references feedback signals G G G Σ Σ Σ H H H
Primitive Action Programming Interface Sensory Error () • Visual (uref) • Tactile (fref) • Configuration variables (θref) • Operational Space(xref) Potential Functions () • Spring potential fields (ϕh) • Collision-free motion fields (ϕc) • Kinematic conditioning fields (ϕcond) Motor Variables () • Subsets of : • Configuration Variables • Operational Space Variables primitive actions: a = a1 a2 Nullspace Projection
no reference () - convergence unknown 1 X descending gradient 0 State Representation • Discrete abstraction of action dynamics. • 4-level logic in control predicate pi
Learn value functions using reinforcement learning - 1 X 0 Hierarchical Programming • A program is defined as a MDP over a vector of controller predicates: S= p1 … pN • Absorbing states in the value function capture “convergence” of programs.
- - 1 1 X X 0 0 Intrinsic Reward • Goal: build deep control knowledge • Reward controllable interaction with the world • controllers with direct feedback from the external world. Catalog Track Touch Grasp Insert Stack convergence event
Experimental Demonstration • Motor units • Two 7-DOF Barrett WAMs • Two 4-DOF Barrett Hands • 2-DOF pan/tilt stereo head • Sensory feedback • Visual • Hue • Saturation • Intensity • Texture • Tactile • 6-axis finger-tip F/T sensors • Proprioceptive Dexter
Sst= psaccade ptrack atrack atrack X 0 X X X 1 atrack 1 X X - asaccade asaccade 0 X STAGE 1: SaccadeTrack- 25 Learning Episodes Track-saturation rewarding action
Srg= pst preach pgrab STAGE 2: ReachGrab - 25 Learning Episodes Touch Track-saturation rewarding action
STAGE 2: ReachGrab - 25 Learning Episodes Touch Track-saturation
Svi= prg pcond ptrack(blue) Track-blue STAGE 3: VisualInspect - 25 Learning Episodes Touch Track-saturation rewarding action
Track-blue STAGE 3: VisualInspect - 25 Learning Episodes Touch Track-saturation
Sgrasp= prg pmoment pforce Grasp Track-blue STAGE 4: Grasp – User Defined Reward Touch Track-saturation X - - X 1 0 X X X X 1 1 X 0 0 X 0 1 ReachGrab amoment aforce rewarding action - 1 X X X 1 0
X 0 0 X 0 - - X 1 1 1 X X X 1 0 1 X X X X atransport amoment 0 Grasp X - - STAGE 5: PickAndPlace – User Defined Reward Spnp= pg ptransport pmoment rewarding action
Conclusions • Mechanisms for creating hierarchical programs. • recursive formulation of potential functions and value functions. • control theoretic representation for action, state, and intrinsicreward. • Experimental demonstration of programming manipulation skills using staged learning episodes. • Intrinsic reward pushes out new behavior and models the affordances of objects.