Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso )

Learning Equivalent Action Choices from Demonstration(S. Chernova and M. Veloso) Basia Korel Brown University cs2950-z February 15, 2010

Outline • Overview • Demonstration Learning Algorithm • Confident Execution • Corrective Demonstration • Limitations • Option Class Algorithm • Experiments and Results • Conclusion

Overview • Addressing: equivalent action choices • The context: learning from demonstration • In the real world: equivalent actions demonstrated arbitrarily and inconsistently

Overview • Resulting problem: labeled training data lacks consistency • Contribution: identify, represent and enact equivalent action choices • Identify conflicting demonstrations • Represent choice of multiple actions in the policy • Common assumption of previous approaches: each state maps to one best action

Demonstration Learning Algorithm • Learning equivalent actions is built upon: • Confident Execution: to obtain teacher demonstrations and learn the action policy • Corrective Demonstration: to correct execution mistakes by additional demonstrations

Confident Execution • An interactive learning algorithm. Given the current world state, the robot: • Determines the need for a demonstration based on a confidence • May request demonstrations to improve policy

Confident Execution • Robot’s policy represented by classifier C : s(a,c,db) • Trained using states as inputs and actions as labels • Measure of action selection confidence

Corrective Demonstration • An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations

Limitations • Assumptions made: • One-to-one state-action mapping • Consistent demonstrations • A complete policy given enough demonstrations • Assumptions may fail in the real world! • Multiple equivalent actions cause ambiguity • Robot sensor noise may cause inconsistency

Option Class Algorithm • Option class: a cluster of data points that have been labeled with at least two different actions • Algorithm: extracts and explicitly models option classes in the robot’s policy

Option Class Algorithm given demonstration dataset D MPointsInLowConfidenceRegion(D) dMeanNearestNeighborDist(D) CConnectedComponents(M,d) forc ∈ Cdo AActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A) UpdateClassifier(D) ResetClass(D)

Experiment • Obstacle avoidance domain: • Gathered data:

Evaluation • Evaluation: Confident Execution with and without option classes • Metrics: • % of complete policies • # of demonstrations • NOT classification accuracy • Results (with respect to option classes): • Converge to complete policy with much higher frequency • Required demonstrations much lower

Example Option Class Policies

Conclusion • Multiple equivalent actions exist in the real world • Model action choices explicitly in the policy • Domain limitations: discrete action labels

Thanks • Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso )

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso )

Presentation Transcript

Learning by Demonstration for the Masses

LEARNING FROM EXPERIENCE

Demonstration Qualifying Action

CMRoboBits: Creating an Intelligent AIBO Robot

From SMART Goals to Action!

Generation Choices

A Confidence-Based Approach to Multi-Robot Demonstration Learning

Confidence Based Autonomy: Policy Learning by Demonstration

Exposition Rising Action Climax Falling Action Resolution

Learning to Maximize Reward: Reinforcement Learning

Markets in Action

Toward M5-branes from ABJM action

Learning From Demonstration Atkeson and Schaal

CMRoboBits: Creating an Intelligent AIBO Robot

Learning from Action

Action Potential Types

CAMEO: Meeting Understanding

Learning From Demonstration Atkeson and Schaal

Iteration Learning by Demonstration