1 / 16

Learning From Demonstration Atkeson and Schaal

This study explores the concept of robot learning from a small number of human demonstrations at a task-level, focusing on learning intent rather than simple mimicry. The implementation details and success of various approaches are discussed, including parametric and nonparametric learning methods. The study demonstrates successful learning in a pendulum swing-up task, and the application of task-level learning in a more complex double pump swing-up task.

leei
Download Presentation

Learning From Demonstration Atkeson and Schaal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning From DemonstrationAtkeson and Schaal Dang, RLAB Feb 28th, 2007

  2. Goal • Robot Learning from Demonstration • Small number of human demonstrations • Task level learning (learn intent, not just mimicry) • Explore • Parametric vs. nonparametric learning • role of a priori knowledge Dang, RLAB

  3. Known Task • Pendulum swing-up task • Like pole balancing, but more complex • Difficult, but easy to evaluate success • Simplified • Restricted to horz. motion • Impt. variables picked out • Pendulum angle • Pendulum angular velocity • Hand location • Hand velocity • Hand acceleration Dang, RLAB

  4. Implementation details • SARCOS 7DOF arm • Stereo Vision, colored ball indicators • 0.12s delay overcome with Kalman filter • Idealized pendulum dynamics • Redundant inverse kinematics and real-time inverse dynamics for control Dang, RLAB

  5. Learning • Task composed of two subtasks • Believe that subtask learning accelerates new task learning • 1 Pole Swing up • open-loop • 2 Upright Balance • Feedback • Focus here on swing-up • Balancing already learned Dang, RLAB

  6. First approach • Directly mimic human hand movement • Fails • Differences in human and robot capabilities • Improper demonstration (not horizontal) • Imprecise mimicry Dang, RLAB

  7. Approach the second • Learn reward • Learn a model • Use human demonstration as seed so a planner can find a good policy Dang, RLAB

  8. Learn Task Model • Parametric: • learn parameters via linear regression • Nonparametric • Use Locally Weighted Learning • Given desired variable and a set of possibly relevant input variables • Cross validation to tune meta-parameters Dang, RLAB

  9. Swing up • Transition to balance occurs at ± 0.5 radians with angular vel. < 3 rad/sec • Reward function set to make robot want to be like demonstrator Dang, RLAB

  10. Parametric • Parameters learned from failure data • Trajectory optimized using human trajectory as seed • SUCCESS Dang, RLAB

  11. Nonparametric • Slower, but still successful Dang, RLAB

  12. Harder Task • Double pump swing up • Approach fails • Believed to be due to improper modeling of the system • Solved by Dang, RLAB

  13. Direct task-level learning • Learn a correction term to add to the target angle • Now target ± (0.5+∆)rad • Use binary search • Worked for parametric • Didn’t for nonparametric • Left region of validity of local models • So, tweak velocity all over • Binary search for coefficient Dang, RLAB

  14. Results Dang, RLAB

  15. Summary of Technique Succeeds for Math Watch demo, mimic hand None Learn model, optimize demo trajectory Parametric, single Tune model, reoptimize Nonparametric, single Binary search for delta Parametric, double Binary search for c Nonparametric, double Dang, RLAB

  16. Discussion points • Reward function was given or learned? • Does task-level direct learning make sense? • Only useful in this task / implementation? • I in PID? • Nonparametrics don’t avoid all modeling errors • Poor planner? • Not enough data? • A priori knowledge • human selects inputs, outputs, control system, perception, model selection, reward function, task segmenting, task factors • It Works! Dang, RLAB

More Related