Socially Guided Learning

Socially Guided Learning • Perspectives from child development • Perspectives from machine learning and robotics • Current work of the Robotic Life Group • Future goals

Situated Learning • Vygotsky - Zone of Proximal Development (ZPD) - Adults provide support that lets child accomplish new things they would not be able to do alone. • Internal external planes - everything internal is first external: • learning to control own behavior involves transition from external speech to internal speech (speech used to influence others becomes internalized and is able to influence self). • Ex: learning to direct one’s attention is internalizing parent’s attention direction. • Ex: pointing, starts as an unsuccessful grasp that gains meaning as a gesture from other people’s response.

Situated Learning • Social Scaffolding • Bruner: Scaffolding is asymmetric cooperation, becomes symmetric over time • Roles are initially demarcated but eventually become reversible. The child grows into a more capable partner. • Cazden: Scaffolding externalizes thinking process • In problem solving (reading), a common simplification: adult switches from wh-questions, to yes/no (“Do you know where X is?”–“No...” then adult shapes the search to externalize the correct mental processes “is it ... ?”). The yes/no questions are often absurd, to define the extremes. • Trevarthen:Scaffolding is dynamic • Referential gestures - if the infant doesn’t understand pointing reference the adult will simplify and touch the object • Language - initially adult treats all speech as conversational, over time expectations raise, scaffolding the child’s abilities

Situated Learning • Arbib: key aspects of parent/child tutoring session • Placing important objects close to the child’s face. • Embodying – putting their body through the action • Talking while acting (word/action correspondence) • Arranging environment so desired action is reachable • Simplifying the task after a failed attempt • Approval of embellishment after successful attempt • Demonstrating action in infant’s line of sight, to introduce important affordances • Invites child to imitate after a demonstration, assisted imitation turn-taking • Child pays attention to gaze direction of the adult to direct their own attention

Situated Learning • What motivates a child to learn from an adult? • Like-me: is innate, motivates learning, and motivates desire to control own behavior: • Meltzoff: ability to map actions seen by other and done by self seen immediately (day-old infants imitate facial expressions) …thus mapping is somewhat innate. • Lave, Wenger: ‘Legitimate peripheral participation’; motivation to become a full participant in the practice is driving force in learning a new practice. Learning motivated by children ‘wanting to become full participants in the adult world’ • Litowitz: child stops imitating due to desire to have the adult-role of structuring activity (wanting to choose the clothes they wear, resisting being told what to do,etc.)

Machine Learning • Unsupervised learning • Machine analyzes unlabeled data to find regularities, clustering. • Supervised learning • Human provides labeled examples of desired result - various approaches learn decision boundaries or function approximations to handle future unlabeled data appropriately. • Reinforcement learning • Human provides reinforcement signal - telling the machine what to learn rather than how - machine learns how to best behave in various states of the world to maximize reward.

Socially Guided Machine Learning • Human can be more than data labeler or reinforcement signal • Teachers direct a learner's attention, structure their experiences, support their learning attempts, and regulate the complexity of information. • The learner contributes by revealing their internal state to help guide the teaching process. • Tutelage is a fundamentally collaborative process, enabling a learner to acquire new concepts and skills from few examples.

Related Works • Active learning, learning with queries • Machine does unsupervised learning to identify examples that lie on decision boundaries, queries human for labels of just the interesting examples. • Learning by demonstration • Perception/action mapping - forward models (Schaal) • Generally such approaches use demonstration to seed the search space, optimizing a pre-defined goal function • Blumberg • Inspired by animal training; interactive dog character is example of system that learns from guided exploration and is rewarding to teach - has transparency.

Related Works • Lauria et al.: Instruction-Based Learning • Using natural language to instruct a robot in navigation task. • Words and phrases grounded to motor primitives through a corpus based approach • Human can also teach new phrases, constructed from known ones. • Instruction prior to execution, requires human to mentally represent task prior to execution.

Related Works • Nicolescu: robot task learning (navigation) • Robot follows human through task, learns path through an environment of objects. • Short verbal commands (‘here’, ‘start’, etc.) point out key info and frame the demo. • Human can interrupt to correct the task model during a demo (‘stop’). • Learns generalized task model, over a few examples

Dialog systems • Discourse analysis and collaborative dialog theory used in plan recognition and tutorial systems • Given a database of domain specific plans, generic dialog modeling lets machine to assist or teach the human how to progress and execute the plan. • Collagen - architecture for tutorial systems • focus of attention constrains search for possible plans. • plans constructed incrementally through dialog. • Clarification takes care of ambiguity.

Socially Guided Machine Learning • Platform of Robotic Life Group: Leonardo • Learning by human tutelage leverages from structure provided through interpersonal interaction. It is a collaboration between the teacher and the learner. • Exploring how social guidance interacts with traditional algorithms (such as Bayesian hypothesis testing) in interactive learning scenario. • Allowing a human to teach a robot a new task using dialog and gesture

Leo’s Learning - Overview • Human interactively instructs the robot, building a new task from its set of known actions and tasks. • Each task example yields a number of potential task representation hypotheses, letting Leo build a flexible goal-oriented hierarchical task model. • Executing tasks and incorporating feedback narrows the hypothesis space, converging to their best representation.

Goal Oriented Tasks • Tasks and constituent actions are Action-Tuples: <precondition, executable, goal>, yielding hierarchical representation. • Learning: pay attention to actions performed, infer goals upon completion of action, sub-task, or task. Thus has overall task goal and component goals. • Learning is naturally ambiguous, the task representation is flexible - set of hypotheses. • A task hypothesis has executables, a goal representation, and the number of examples that have been consistent with this representation.

Goal Types • Goals are object-based - set of beliefs about what must hold true • Just-do-it goals need to be executed regardless of their impact; has a single goal belief. • State-change goals represent a world change • ambiguity around what exactly about the end state was the goal (the change to an object, a class of objects, the whole world, etc.) • goal belief made for each object that incurred a change during the action or task.

Goal Beliefs • Goal Belief has criteria and expectation features • Criteria: features that held constant • Expectation: features of object that changed. • EX: Task-X finishes, toy A unchanged and toy B changed from green to red. • State-change goal for Task-X • one goal belief • criteria features [type=toy, location=xyz, label=B, etc] • expectation feature [color=red].

Making Task Hypotheses • The literal representation of the task example (these exact actions and these specific goal beliefs) is always kept as one hypothesis. • Also possibily generalize other hypotheses • If task consisted of different kinds of actions then it is assumed that the specific actions and their sequence matter, no generalization happens • If all task actions have same primitive type but just different objects of attention, then can expand hypotheses

Making Task Hypotheses • Primitive type = generalized task action (GTA). • Common Goal Belief (CGB) = the least common denominator for all the task goal beliefs. • Example: Task with 2 goal beliefs [criteria: red, toy, loc (1,2,3)][expectation: on] [criteria: red, toy, loc (4,5,6)][expectation: on] ------------------make CGB------------------------------ [criteria: red, toy] [expectation:on]

Making Task Hypotheses • Make task hypothesis for each of the various combinations of features from the Common Goal Belief • The GTA is the executable for each • This expansion yields a hypothesis space of all representations consistent with the current task example.

Making Task Hypotheses • Continuing the example… • CGB: [criteria: red, toy] [expectation: on] • GTA: press • Then expanded task hypothesis space => • Action: press; Goal: [criteria: red, toy] [exp: on] • Action: press; Goal: [criteria: toy] [exp: on] • Action: press; Goal: [criteria: red] [exp: on]

Hypothesis Testing • Current best task representation is chosen through a Bayesian likelihood method. • P(h|D) ~ P(D|h)P(h) • D is the set of all examples seen for this task. • P(D|h) is % of examples where the state change is consistent with the goal representation of h. • P(h), prefers more specific hypothesis over general one. • E.g., when task first learned, every hypothesis equally represented in data and it chooses most specific representation for the next execution.

Task Execution and Feedback • Task's executables expanded onto a focus stack • Execution pops actions when done, or tests sub-task goal before pushing its actions to stack. • If likelihood of hypothesis <0.5, unconfident. • Frequently looks to instructor during execution • Leans forward, ears perked when finished. • Positive verbal feedback, hypothesis confirmed • Negative verbal feedback, expects refinement • Every task execution results in new task example • Bayesian method selects hypothesis for next time

Social Cues for Scaffolding

Learning - Trial 1 Social Function Robot Human World Leo, turn the buttons on Shrugs, confused face Indicates that he does not know how “I can teach you to turn the buttons on” Confirming head nod Establishes teacher and learner roles “Press the red button” Presses the red button, looking at it Learning by doing: encodes the button state change as the goal of the press action “Now the buttons are on!” Confirming head nod

Learning - Post Trial 1 Expand task hypotheses space after Trial 1: [press] goal-> [the red button: ON] [press] goal->[any button: ON] [press] goal->[any red button: ON] [press] goal->[any object: ON] All are equally likely given this one example, priors favor the most specific for next execution: [press] goal->[this one button: ON]

Learning - Trial 2 Social Function Robot Human World “Can you turn the buttons on?” Nods yes Presses red button Perks ears for feedback Using hypothesis from Trial 1, but not too confident yet, communicates this to teacher “Not quite… Expects refinement “Press the green button” Presses the green button, looking at it Learning by doing: encodes the button state change as the goal of the press action, refining the task example “Now the buttons are on!” Confirming head nod

Learning - Post Trial 2 More hypotheses from trial 2, the hypothesis space is… Hypothesis: Coverage: [press] goal->[the red button: ON] 1 [press] goal->[the red and green buttons: ON] 1 [press] goal->[any button: ON] 2 [press] goal->[any red button: ON] 1 [press] goal->[any object: ON] 2 Two are more likely given the two examples, priors favor the most specific for next execution: [press] goal->[any button: ON]

Learning - Trial 3 Social Function Robot Human World “Leo, have you learned to turn the buttons on?” Nods head; Looks down to check task state; Expands a press action for any button not on. Perk ears for feedback Indicates that he does know Eye gaze, joint task attention Still not really confident in task hypothesis, communicate this to the teacher “Good!” Confirming head nod

Learning - Post Trial 3 More hypotheses from trial 3, the hypothesis space is… Hypothesis: Coverage: [press] goal->[the red button: ON] 1 [press] goal->[the red and green buttons: ON] 1 [press] goal->[the red/green/blue buttons: ON] 1 [press] goal->[any button: ON] 3 [press] goal->[any red button: ON] 1 [press] goal->[any object: ON] 3 Two are more likely given the two examples, priors favor the most specific for next execution: [press] goal->[any button: ON]

Learning - Trial 4 Social Function Robot Human World “Leo, turn the buttons on?” Looks to check task state Presses green button. AND Human undoes the blue button Looks at human and button, presses blue button Waits for feedback Eye gaze, joint task attention Shows that he noticed the change, and shows commitment to the [any button on] task goal “Good!” Confirming head nod

Learning - Post Trial 4 More hypotheses from trial 4, again the most specific hypothesis among the most likely is… [press] goal->[any button: ON]

The Full Learning Session

Leo learning: ‘Turn buttons ON’

Reinforcement Learning: ‘Turn buttons ON’ • State space: all ON/OFF button configurations • Actions: the different button presses. • Reward function: (+1) for 'all buttons ON' else (0) • Q-Learning algorithm is to learn the value of the various state-action pairs (the Q-values). • Simulated three configurations • Two buttons visible. (8 Q-values initialized 0.5) • Three buttons are visible. (24 Q-values initialized 0.5) • Three buttons, seeded with Two-button Q-values

Q-Learning Results

Goal Oriented Task Representation • Leo learns the goal of the task - not just actions, but desired result, and shows commitment. • Flexible realistic generalization: task state can change or partial state can be presented • Other approaches: goal encoded in reward, machine learns the best way to achieve it. • Reinforcement Learning: adding a button alters space, requires more learning. • Seeding process with prior config still needs more learning.

Transparent Learning • Transparency leads to more relevant and timely instruction and thus to more efficient learning. • Mutual beliefs maintained with demonstrations, expressive gestures, eye gaze • Teacher is interactively narrowing hypothesis space with robot • ”Opaque" systems: difficult to understand learning process or why the system is doing it wrong • Q-Learning, explores the state-action space, no specific guidance from the human partner - human waits for 45 presses, interacts 12 times.

Just-in-time Error Correction • Turn-taking dialog framework exhibits problems right away. • Gesture and eye gaze express low confidence, naturally soliciting feedback and further examples. • Typical reinforcement learning: feedback propagates after a complete attempt, credit assignment problems • Socially guided approach: feedback leads to refinement, corrects errors as they arise.

And better for the Teacher too… • Transparency in the learning process lets the human have a mental model of the learner, and setting expectations appropriately • Typical machine learning approaches are opaque leaving the teacher asking: ‘why is it doing that’? • Q-Learning: the teacher would be bored/frustrated • Watching the robot explore the space with 45 presses • Only interacting the 12 times it reaches the goal • For robot learning to be successful the process will need to be rewarding for the humans that have to teach them.

Related Work (from the teacher’s perspective) • Kaiser - Teacher uses joystick to drive robot, then can optimize skill with ‘good’/’no good’ feedback after end of each demo, hard to assign credit; no presentation of internal robot state to teacher. • Lauria et al. - All teaching is before action, cognitive burden on the teacher to maintain a model of the task out of context. • Nicolescu & Mataric - The right idea, somewhat slow and tedious; can interrupt the robot mid-demo for just-in-time corrections; main diff: no use of natural social cues for expressing internal state to the teacher (only demos)

Future Goals • Meta learning strategies • Apply something learned in one context to another • Social cues to direct attention • Implementing more of the social scaffolding cues • Adding exploration to yield guided exploration • Modeling motivation to learn • Like me recognition • Curiosity, boredom, etc. • User studies to verify people like this approach better than standard machine learning

Socially Guided Learning

Socially Guided Learning

Presentation Transcript

Using Oracle Guided Learning Paths

Socially Adept Technologies

21 st Century Learning Guided Inquiry

Guided Discovery Learning and Teaching Strategies

Socially Responsive Marketing

Socially Accessible

Guided Discovery Learning

POGIL: Process-Oriented Guided Inquiry Learning

Mutually-guided Multi-agent Learning

Socially

Inquiry-Guided Learning

Socially Responsible Procurement

Pragmatically-guided perceptual learning

Socially sensitive research

Socially Speaking

Process Oriented Guided Inquiry Learning

Teaching and Learning Rubric Guided Reading

Socially Maladjusted

Guided Learning Projects

Ontological Support for Socially-enhanced Self Regulated Learning

Socially Responsible Company