Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst

A Developmental Approach • Infant Learning • In stages • Maturation processes • Parents provide constrained learning contexts • Protect • EasyComplex • Motion mobile for newborns • Use brightly colored, easy to pick up objects • Use building blocks • Association of words and objects

Application in Robotics • Framework for Robot Developmental Learning • Role of teacher: setup learning contexts that make target concept conspicuous • Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback • Control Basis • Robot actions are created using combinations of <σ,ф,τ> • Establish stages of learning by time-varying constraints on resources • Easy  Complex

Example • Learning to Reach for Objects • Stage 1: SearchTrack • Focus attention using single brightly colored object (σ) • Limit DOF (τ) to use head ONLY • Stage 2: ReachGrab • Limit DOF (τ) to use one arm ONLY • Stage 3: Handedness, Scale-Sensitive Hart et. al, 2008

Prospective Learning • Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy

Robot Prospective Learning with Human Guidance a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn Challenge g(f)=0 g(f)=1 a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn sub-task Si1 Sij Sin

A 2D Navigation Domain Problem • 30x30 map • 6 doors, randomly closed • 6 buttons • 1 start and 1 goal • 3-bit door sensor on robot

Flat Learning Results • Flat Q-Learning • 5-bit state • (x,y, door-bit1, door-bit2, door-bit3) • 4 actions • up, down, left, right • Reward • 1 for reaching the goal • -0.01 for every step taken • Learning parameter • α=0.1, γ=1.0, ε=0.1 • Learned solutions after 30,000 episodes

Prospective Learning • Stage 1 • All doors open • Constrain resources to use only (x,y) sensors • Allow agent learn a policy from start to goal Down Right Right Up Right Right Right S0 S1 Si Sj Sn

Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause

Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause • Create a sub-task • Learn a new policy to sub-task

Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause • Create a sub-task • Learn a new policy to sub-task • Resume original policy

Prospective Learning Results Learned solutions < 2000 episodes

Humanoid Robot Manipulation Domain • Benefits of Prospective Learning • Adapt to new contexts by maintaining majority of the existing policy • Automatically generates sub-goals • Sub-task can be learned in a completely different state space. • Supports interactive learning

Conclusion • A developmental view to robot learning • A framework enables interactive incremental learning in stages • Extension to the control basis learning framework using the idea of prospective learning

Learning Prospective Robot Behavior