Learning from Disagreeing Demonstrators

Learning from Disagreeing Demonstrators Bruno N. da Silva University of British Columbia bnds@cs.ubc.ca

Motivation • Some traditional cases of Learning from Demonstration assume a human expert • In some (subjective) tasks, there might not be a single expert • How to drive from point A to B

Motivation • In general, these tasks involve more than one feature • e.g. in the driving domain, want to optimize travel time and number of crashes • Different contexts lead to different tradeoffs between features • Idiosyncratic demonstrators do not reflect on their routine approach to the problem

Problem definition • How can we integrate idiosyncratic (disagreeing) demonstrations to form a homogeneous and effective policy?

Solution • We extend the framework presented by Argall et al, 2007 • Traditional demonstrations in the first stage • Robot execution and human critique in the second stage • Robot collects critiques • Robot updates policy

The 1st stage of the mechanism

The 2nd stage of the mechanism

A little more concretely… • The first stage can be interpreted as a set of datapoints (pm,an,c) • Perception pm • Actionan • Confidence on the mappingc • The criticism will affect the confidence • If praise the execution, increase c • If knock the execution, decrease c

But let’s not be naïve • If demonstrators “lie” in the demonstration, they would “lie” in the criticism • Therefore, associate a reputation riwith each demonstration di • And update the confidence level carefully • c := c + ri * f(feedback)

Adjusting reputation ranks • And adjust ribased on (lack of) improvement from di’s feedback • ri := ri +  * evaluation(feedback) • evaluation(.) can be interpreted as a Pareto improvement from the feedback

Current investigations • Policy conversion? • Rate of conversion? • What are the long term effects on human demonstrators? • Frustration? • Repudiation? • Will critiques really be mindful?

Thanks! • Questions?

Learning from Disagreeing Demonstrators

Learning from Disagreeing Demonstrators

Presentation Transcript

Learning from Disagreeing Demonstrators

AAVS1/2-low demonstrators

Learning from

Long Term Conditions Whole System Demonstrators

Whole System Demonstrators

Learning From Visualizations: Principles from Learning Science

Data management demonstrators

Empathy and Communication: Disagreeing Respectfully

Empathy and Communication: Disagreeing Respectfully

Energy Harvesting Demonstrators

Agreeing and Disagreeing

Agreeing or Disagreeing

DEMONSTRATORS OF THE YEAR 2009

11 FCT National Demonstrators

possibilities to test crystals demonstrators

Demonstrators

Learning from Internationalisation

Plasma Node demonstrators

Expressions for Agreeing and Disagreeing

WP6 Railway Demonstrators

Personal Demonstrators

Performance test of STS demonstrators