170 likes | 294 Views
Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso ). Basia Korel Brown University cs2950-z February 15, 2010. Outline. Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm
E N D
Learning Equivalent Action Choices from Demonstration(S. Chernova and M. Veloso) Basia Korel Brown University cs2950-z February 15, 2010
Outline • Overview • Demonstration Learning Algorithm • Confident Execution • Corrective Demonstration • Limitations • Option Class Algorithm • Experiments and Results • Conclusion
Overview • Addressing: equivalent action choices • The context: learning from demonstration • In the real world: equivalent actions demonstrated arbitrarily and inconsistently
Overview • Resulting problem: labeled training data lacks consistency • Contribution: identify, represent and enact equivalent action choices • Identify conflicting demonstrations • Represent choice of multiple actions in the policy • Common assumption of previous approaches: each state maps to one best action
Demonstration Learning Algorithm • Learning equivalent actions is built upon: • Confident Execution: to obtain teacher demonstrations and learn the action policy • Corrective Demonstration: to correct execution mistakes by additional demonstrations
Confident Execution • An interactive learning algorithm. Given the current world state, the robot: • Determines the need for a demonstration based on a confidence • May request demonstrations to improve policy
Confident Execution • Robot’s policy represented by classifier C : s(a,c,db) • Trained using states as inputs and actions as labels • Measure of action selection confidence
Corrective Demonstration • An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations
Limitations • Assumptions made: • One-to-one state-action mapping • Consistent demonstrations • A complete policy given enough demonstrations • Assumptions may fail in the real world! • Multiple equivalent actions cause ambiguity • Robot sensor noise may cause inconsistency
Option Class Algorithm • Option class: a cluster of data points that have been labeled with at least two different actions • Algorithm: extracts and explicitly models option classes in the robot’s policy
Option Class Algorithm given demonstration dataset D MPointsInLowConfidenceRegion(D) dMeanNearestNeighborDist(D) CConnectedComponents(M,d) forc ∈ Cdo AActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A) UpdateClassifier(D) ResetClass(D)
Experiment • Obstacle avoidance domain: • Gathered data:
Evaluation • Evaluation: Confident Execution with and without option classes • Metrics: • % of complete policies • # of demonstrations • NOT classification accuracy • Results (with respect to option classes): • Converge to complete policy with much higher frequency • Required demonstrations much lower
Conclusion • Multiple equivalent actions exist in the real world • Model action choices explicitly in the policy • Domain limitations: discrete action labels
Thanks • Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders