830 likes | 952 Views
Socially Guided Machine Learning. Andrea L. Thomaz PhD Thesis Defense April 7, 2006. Socially Guided Machine Learning. Andrea L. Thomaz PhD Thesis Defense April 7, 2006. Thesis Committee. Cynthia Breazeal Associate Professor of Media Arts & Sciences, MIT. Rosalind Picard
E N D
Socially Guided Machine Learning • Andrea L. Thomaz • PhD Thesis Defense • April 7, 2006
Socially Guided Machine Learning Andrea L. Thomaz PhD Thesis Defense April 7, 2006
Thesis Committee Cynthia Breazeal Associate Professor of Media Arts & Sciences, MIT Rosalind Picard Professor of Media Arts & Sciences, MIT Andrew Barto Professor of Computer Science, U. Massachusetts, Amherst
If robots are going to be successfully deployed in human environments, like homes schools and offices... They will need to learn new skills from everyday people.
Socially Guided Machine Learning How can algorithms and systems take better advantage of learning from a human partner and the ways that partner will naturally approach teaching?
Personalization agents, Adaptive user interfaces {Lashkari, Metral, Maes, Collaborative Interface Agents, AAAI 1994} {E. Horovitz et al., The Lumiere project, UAI 1998} Active Learning, Learning with Queries {Cohn, Ghahramani, Jordan, Active learning with statistical models, 1995} {Cohn et al., Semi-supervised clustering with user feedback, 2003} Learning by Demonstration, Programming by Example {Voyles, Khosla, Programming robotic agents by demonstration, 1998} {Lieberman, Your Wish is my Command, 2001} Learning by Imitation {S. Schaal review in TICS 1999} Animal training techniques {Stern, Frank, Resner, Virtual Petz, Agents 1998} {Blumberg et al. Integrated learning for interactive characters, SIGGRAPH 2002} {Kaplan et al., Robot clicker training, RAS 2002} Reinforcement Learning with humans {Isbell et al. Cobot: a social reinforcement learning agent, UAI 1998} {Evans, Varieties of Learning, AI Game Programming Wisdom, 2002} {Clouse, Utgoff, Teaching a Reinforcement Learner, ICML 1992}
Guidance Initial Experiment Transparency } Asymmetry Overview
Research Platforms Leonardo Sophie’s Kitchen
The Leonardo Platform Inputs Cognitive Architecture . Pointing gesture recognition . Eye cameras & environmental cameras for object recognition . Head pose tracking (Darrell) . Sphinx-4 speech recognition . Builds on c5m system of the Synthetic Characters Group
Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Sophie learns via Q-Learning ~10,000 states 2-7 actions/state
Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Human player uses the mouse to give feedback to Sophie
Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. An object specific reward is about a particular part of the world
Guidance Initial Experiment } Transparency Asymmetry
Experiments in Sophie’s Kitchen • “How Do People Want to Teach?”
Findings: Guidance • People tried to use the object specific rewards as future directed guidance.
Never About Most Recent Object Always About Most Recent Object % % % % % % Each player’s %Object Rewards about last object • Many object rewards not about the last object used
Number of People At least 1 reward to Empty Bowl Zero rewards to Empty Bowl • Almost everyone gave rewards to the bowl or tray sitting empty on the shelf...a guidance reward.
Findings: People Infer a Mental Model • People gave more rewards after realizing their feedback made a difference
human rewards : agent actions (Avg) (Avg) (Avg) Individual Individual Individual
Findings: Positive Bias • Even in the first quarter of their training sessions, most people had a positive bias in their rewards.
Guidance Transparency } Asymmetry Initial Experiment
Guidance Initial Experiment } Transparency Asymmetry
Guidance • What’s the right level of interaction?
Guidance Exploration
Guidance Exploration . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language
Guidance Exploration RL with human reward . Robot shaping . RL game characters . . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language
Leo Learning in a Social Dialog Leo Learning in Guided Exploration Adding Guidance to Sophie Original Sophie Guidance Exploration
Learning within a Social Dialog • Goal-oriented task built based on known actions and tasks. • Expands hypotheses of goal representations. • Through tightly coupled dialog with a human partner, the hypothesis space is refined to the best representation of the task.
Tasks & Goals • Task structure & goals inferred in interaction with human teacher
Goals Goal Inferred: Criteria & Expectation Features for each object incurring change over the task/action. Example: Task X A A B B expectation: color: red criteria: type: toy shape: cir. loc: 1,2,3 name: A
Expand Task Hypotheses • Exact action sequence is always a hypothesis • AND expands hypothesis space of representations consistent with the current task example
color: red color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C type: toy shape: cir. Expand Task Hypotheses • Common Goal Belief = least common denominator for all the changed objects. Example: C C Task X A A B B
color: red color: red color: red shape: cir. type: toy shape: cir. type: toy And include the literal version too... color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C Expand Task Hypotheses Expand various combinations
Hypothesis Testing • Current best task representation chosen through Bayesian likelihood, P(h|D) ~ P(D|h)P(h) • D = examples seen of this task so far • P(D|h) = % examples consistent with hypothesis, h • P(h) = prefers specific (more criteria over less)
Utilizing Guidance in Sophie’s Kitchen Interactive Q-Learning Algorithm used in the Original Sophie experiment slight delay to animate act and receive human reward }
+ >> only Effects of Guidance • 28 subjects played Sophie’s Kitchen in lab Conditions: feedback vs. feedback+guidance t(26); p<.01 for each
Leo Learning in a Social Dialog Leo Learning in Guided Exploration Benefits: Benefits: . Teacher need not know the task exactly . Teacher need not be present for learning . Social Cues frame the learning interaction . Assumes goal-oriented partner helps build flexible task Guidance Exploration Sophie with Guidance
Novelty Mastery Activity hi lo initial range | | | drift Guided Exploration Self-Motivated Behavior
Novelty Mastery Activity Novelty Guided Exploration
Task Learning Action Group Novelty Action Explore Action Relevance Action
expectation: Goal features ... ... criteria: action values: Task Option Model State x Act.1 : val Act.2 : val .... learning mechanism: RL with self created goal, learn hierarchical policy to achieve it (Options & Intra-Option learning) Task Representation
Guided Exploration Human partner influences learning . Directing attention . Suggesting actions . Labeling goal states . Providing positive / negative feedback