Socially Guided Machine Learning

Socially Guided Machine Learning • Andrea L. Thomaz • PhD Thesis Defense • April 7, 2006

Socially Guided Machine Learning Andrea L. Thomaz PhD Thesis Defense April 7, 2006

Thesis Committee Cynthia Breazeal Associate Professor of Media Arts & Sciences, MIT Rosalind Picard Professor of Media Arts & Sciences, MIT Andrew Barto Professor of Computer Science, U. Massachusetts, Amherst

If robots are going to be successfully deployed in human environments, like homes schools and offices... They will need to learn new skills from everyday people.

Socially Guided Machine Learning How can algorithms and systems take better advantage of learning from a human partner and the ways that partner will naturally approach teaching?

Personalization agents, Adaptive user interfaces {Lashkari, Metral, Maes, Collaborative Interface Agents, AAAI 1994} {E. Horovitz et al., The Lumiere project, UAI 1998} Active Learning, Learning with Queries {Cohn, Ghahramani, Jordan, Active learning with statistical models, 1995} {Cohn et al., Semi-supervised clustering with user feedback, 2003} Learning by Demonstration, Programming by Example {Voyles, Khosla, Programming robotic agents by demonstration, 1998} {Lieberman, Your Wish is my Command, 2001} Learning by Imitation {S. Schaal review in TICS 1999} Animal training techniques {Stern, Frank, Resner, Virtual Petz, Agents 1998} {Blumberg et al. Integrated learning for interactive characters, SIGGRAPH 2002} {Kaplan et al., Robot clicker training, RAS 2002} Reinforcement Learning with humans {Isbell et al. Cobot: a social reinforcement learning agent, UAI 1998} {Evans, Varieties of Learning, AI Game Programming Wisdom, 2002} {Clouse, Utgoff, Teaching a Reinforcement Learner, ICML 1992}

Socially Guided Machine Learning

Guidance Initial Experiment Transparency } Asymmetry Overview

Research Platforms Leonardo Sophie’s Kitchen

The Leonardo Platform Inputs Cognitive Architecture . Pointing gesture recognition . Eye cameras & environmental cameras for object recognition . Head pose tracking (Darrell) . Sphinx-4 speech recognition . Builds on c5m system of the Synthetic Characters Group

Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Sophie learns via Q-Learning ~10,000 states 2-7 actions/state

Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Human player uses the mouse to give feedback to Sophie

Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. An object specific reward is about a particular part of the world

Guidance Initial Experiment } Transparency Asymmetry

Experiments in Sophie’s Kitchen • “How Do People Want to Teach?”

Findings: Guidance • People tried to use the object specific rewards as future directed guidance.

Never About Most Recent Object Always About Most Recent Object % % % % % % Each player’s %Object Rewards about last object • Many object rewards not about the last object used

Number of People At least 1 reward to Empty Bowl Zero rewards to Empty Bowl • Almost everyone gave rewards to the bowl or tray sitting empty on the shelf...a guidance reward.

Findings: People Infer a Mental Model • People gave more rewards after realizing their feedback made a difference

human rewards : agent actions (Avg) (Avg) (Avg) Individual Individual Individual

Findings: Positive Bias • Even in the first quarter of their training sessions, most people had a positive bias in their rewards.

Guidance Transparency } Asymmetry Initial Experiment

Guidance Initial Experiment } Transparency Asymmetry

Guidance • What’s the right level of interaction?

Guidance Exploration

Guidance Exploration . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language

Guidance Exploration RL with human reward . Robot shaping . RL game characters . . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language

Leo Learning in a Social Dialog Leo Learning in Guided Exploration Adding Guidance to Sophie Original Sophie Guidance Exploration

Learning within a Social Dialog • Goal-oriented task built based on known actions and tasks. • Expands hypotheses of goal representations. • Through tightly coupled dialog with a human partner, the hypothesis space is refined to the best representation of the task.

Tasks & Goals • Task structure & goals inferred in interaction with human teacher

Goals Goal Inferred: Criteria & Expectation Features for each object incurring change over the task/action. Example: Task X A A B B expectation: color: red criteria: type: toy shape: cir. loc: 1,2,3 name: A

Expand Task Hypotheses • Exact action sequence is always a hypothesis • AND expands hypothesis space of representations consistent with the current task example

color: red color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C type: toy shape: cir. Expand Task Hypotheses • Common Goal Belief = least common denominator for all the changed objects. Example: C C Task X A A B B

color: red color: red color: red shape: cir. type: toy shape: cir. type: toy And include the literal version too... color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C Expand Task Hypotheses Expand various combinations

Hypothesis Testing • Current best task representation chosen through Bayesian likelihood, P(h|D) ~ P(D|h)P(h) • D = examples seen of this task so far • P(D|h) = % examples consistent with hypothesis, h • P(h) = prefers specific (more criteria over less)

Learning within a Social Dialog

Utilizing Guidance in Sophie’s Kitchen Interactive Q-Learning Algorithm used in the Original Sophie experiment slight delay to animate act and receive human reward }

Utilizing Guidance in Sophie’s Kitchen

+ >> only Effects of Guidance • 28 subjects played Sophie’s Kitchen in lab Conditions: feedback vs. feedback+guidance t(26); p<.01 for each

Leo Learning in a Social Dialog Leo Learning in Guided Exploration Benefits: Benefits: . Teacher need not know the task exactly . Teacher need not be present for learning . Social Cues frame the learning interaction . Assumes goal-oriented partner helps build flexible task Guidance Exploration Sophie with Guidance

Novelty Mastery Activity hi lo initial range | | | drift Guided Exploration Self-Motivated Behavior

Novelty Mastery Activity Novelty Guided Exploration

Task Learning Action Group Novelty Action Explore Action Relevance Action

expectation: Goal features ... ... criteria: action values: Task Option Model State x Act.1 : val Act.2 : val .... learning mechanism: RL with self created goal, learn hierarchical policy to achieve it (Options & Intra-Option learning) Task Representation

Guided Exploration Human partner influences learning . Directing attention . Suggesting actions . Labeling goal states . Providing positive / negative feedback

Leo’s Virtual Playroom

Socially Guided Machine Learning

Socially Guided Machine Learning

Presentation Transcript

Machine Learning

Machine learning

Machine Learning

Machine Learning

Integration Testing of Components Guided by Incremental State Machine Learning

Guided Discovery Learning

Socially Guided Learning

Machine Learning

Inquiry-Guided Learning

Pragmatically-guided perceptual learning

Guided Learning Projects

Machine Learning

Machine learning Courses | Machine Learning Training

machine learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn