120 likes | 262 Views
Learning from Disagreeing Demonstrators. Bruno N. da Silva University of British Columbia bnds@cs.ubc.ca. Motivation. Some traditional cases of Learning from Demonstration assume a human expert In some (subjective) tasks, there might not be a single expert How to drive from point A to B.
E N D
Learning from Disagreeing Demonstrators Bruno N. da Silva University of British Columbia bnds@cs.ubc.ca
Motivation • Some traditional cases of Learning from Demonstration assume a human expert • In some (subjective) tasks, there might not be a single expert • How to drive from point A to B
Motivation • In general, these tasks involve more than one feature • e.g. in the driving domain, want to optimize travel time and number of crashes • Different contexts lead to different tradeoffs between features • Idiosyncratic demonstrators do not reflect on their routine approach to the problem
Problem definition • How can we integrate idiosyncratic (disagreeing) demonstrations to form a homogeneous and effective policy?
Solution • We extend the framework presented by Argall et al, 2007 • Traditional demonstrations in the first stage • Robot execution and human critique in the second stage • Robot collects critiques • Robot updates policy
A little more concretely… • The first stage can be interpreted as a set of datapoints (pm,an,c) • Perception pm • Actionan • Confidence on the mappingc • The criticism will affect the confidence • If praise the execution, increase c • If knock the execution, decrease c
But let’s not be naïve • If demonstrators “lie” in the demonstration, they would “lie” in the criticism • Therefore, associate a reputation riwith each demonstration di • And update the confidence level carefully • c := c + ri * f(feedback)
Adjusting reputation ranks • And adjust ribased on (lack of) improvement from di’s feedback • ri := ri + * evaluation(feedback) • evaluation(.) can be interpreted as a Pareto improvement from the feedback
Current investigations • Policy conversion? • Rate of conversion? • What are the long term effects on human demonstrators? • Frustration? • Repudiation? • Will critiques really be mindful?
Thanks! • Questions?