Adapting to User Affect in a Spoken Dialogue System

Adapting to User Affect in a Spoken Dialogue System Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh, PA

Outline • Motivation • The ITSPOKE System & Corpora • Detecting and Adapting to Student Uncertainty (joint work with Kate Forbes-Riley) • Uncertainty Detection • System Adaptation • Impact on Student Meta(Cognition) • Wizarded and fully-automated experiments • Summing Up

What is Tutoring? • “A one-on-one dialogue between a teacher and a student for the purpose of helping the student learn something.” [Evens and Michael 2006] • Human Tutoring Excerpt [Thanks to Natalie Person and Lindsay Sears, Rhodes College]

Intelligent Tutoring Systems • Students who receive one-on-one instruction perform as well as the top two percent of students who receive traditional classroom instruction [Bloom 1984] • Unfortunately, providing every student with a personal human tutor is infeasible • Develop computer tutors instead

Tutorial Dialogue Systems • Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001] • Currently only humans use full-fledged natural language dialogue

SpokenTutorial Dialogue Systems • Most human tutoring involves face-to-face spoken interaction, while most computer dialogue tutors are text-based • Can the effectiveness of dialogue tutorial systems be further increased by using spoken interactions?

Potential Benefit of Spoken Dialogue • Spoken dialogue contains linguistic information (e.g., acoustic, prosodic, lexical, discourse), providing new sources of information for tutor adaptation • A correct but uncertain student turn • ITSPOKE: How does his velocity compare to that of his keys? • STUDENT: his velocity is constant

More generally… • Detection • Promising across affective states and applications, e.g.: • Craig et al., 2006 • Liscombe, 2006 • Lee & Narayanan, 2005 • Vidrascu & Devillers, 2005 • Batliner et al., 2003 • Adaptation • Sparse, can be difficult to show adaptation improves performance • Some used basic adaptations and showed likeability increases • For other performance metrics, basic adaptations not clear a priori User Affect Application System Adaptation Health Assessment Stress Empathy[Liu & Picard 2005] Gaming Frustration Apology[Klein et al. 2002] Tutoring ??????? ???????

Outline • Motivation • The ITSPOKE System and Corpora • Detecting and Adapting to Student Uncertainty • Uncertainty Detection • System Adaptation • Experimental Evaluation • Summing Up

Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] • Sphinx2 speech recognition and Cepstral text-to-speech

Three Types of Tutoring Corpora • Human Tutoring • 14 students / 128 dialogues (physics problems) • Computer Tutoring • 72 students / 360 dialogues • Wizard Tutoring • 81 students / 405 dialogues • human performs speech recognition, semantic analysis • computer performs dialogue management

Experimental Procedure • College students without physics • Read a small background document • Took a multiple-choice Pretest • Worked 5 problems (dialogues) with ITSPOKE • Took an isomorphic Posttest • Goal was to optimize Learning Gain • e.g., Posttest – Pretest

Monitoring Student State (motivation) Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit (ASR: it is) Tutor28 : Could you please repeat that? Student29 : same (ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero (ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again (ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer (ASR: downward you computer)

Why Uncertainty? • Most frequent student state in our dialogue corpora [Litman and Forbes-Riley 2004] • Focus of other learning sciences, speech and language processing, and psycholinguistic studies [Craig et al. 2004; Liscombe et al. 2005; Pon-Barry et al. 2006; Dijkstra et al. 2006] • .73 Kappa [Forbes-Riley et al. 2008]

Corpus-Based Detection Methodology • Learn detection models from training corpora • Use spoken language processing to automatically extract features from user turns • Use extracted features (e.g., prosodic, lexical) to predict uncertainty annotations • Evaluate learned models on testing corpora • Significant reduction of error compared to baselines [Litman and Forbes-Riley 2006; Litman et al. 2007]

System Adaptation: How to Respond? • Theory-based • [VanLehn et al. 2003; Craig et al. 2004] • Corpus-based • [Forbes-Riley and Litman 2005, 2007, 2008, 2010]

Theory-Based Adaptation:Uncertainty as Learning Opportunity • Uncertainty represents one type of learning impasse, and is also associated with cognitive disequilibrium • An impasse motivates a student to take an active role in constructing a better understanding of the principle. [VanLehn et al. 2003] • Astate of failed expectations causing deliberation aimed at restoring equilibrium. [Craig et al. 2004] • Hypothesis: The system should adapt to uncertainty in the same way it responds to other impasses (e.g., incorrectness)

Corpus-Based Adaptation: How Do Human Tutors Respond? • An empirical method for designing dialogue systems adaptive to student state • extraction of “dialogue bigrams” from annotated human tutoring corpora • χ2analysis to identify dependent bigrams • generalizable to any domain with corpora labeled for user state and system response

Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

Bigram Dependency Analysis - “Student Certainness – Tutor Positive Feedback” Bigrams χ2= 225.92 (critical χ2value at p = .001is 16.27)

Bigram Dependency Analysis (cont.) - LessTutor Positive FeedbackafterStudent Neutralturns

Bigram Dependency Analysis (cont.) - LessTutor Positive FeedbackafterStudent Neutralturns - MoreTutor Positive Feedbackafter “Emotional” turns

Findings • Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor • After uncertain, tutorBottoms Out and avoids expansions • After certain, tutor Restates • After any emotion, tutor increases Feedback • Dependencies suggest adaptive strategies for implementation in our computer tutor [Forbes-Riley and Litman 2010]

Adaptation to Student Uncertainty in ITSPOKE • Most systems respond only to (in)correctness • Recall that literature suggests uncertain as well as incorrect student answers signal learning impasses • Experimentally manipulate tutor responses to student uncertainty, over and above correctness, and investigate impact on learning • Platform: Adaptive version(s) of ITSPOKE

Normal (non-adaptive) ITSPOKE System Initiative Dialogue Format: Tutor Question – Student Answer – Tutor Response Tutor Response Types: to Corrects (C): positive feedback (e.g. “Fine”) to Incorrects (I): negative feedback (e.g. “Well…”) and Bottom Out: correct answer with reasoning (easier) Subdialogue: questions walk through reasoning (harder)

Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State: I+nonU I+U C+U C+nonU Severity: most less least none Adaptive ITSPOKE(s)

Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State: I+nonU I+U C+U C+nonU Severity: most less least none Adaptation Hypothesis: ITSPOKE already resolves I impasses (I+nonU, I+U), but it ignores one type of U impasse (C+U) Performance improvement if ITSPOKE provides additional content to resolve all impasses Adaptive ITSPOKE(s)

Simple Adaptation Same response for all 3 impasses Feedback on only (in)correctness Complex Adaptation Different responses for the 3 impasses Based on human responses [Forbes-Riley and Litman 2010] Feedback on both uncertainty and (in)correctness Based on empathetic computer tutor literature [Wang et al., 2005; Hall et al., 2004; Burleson et al., 2004] Two Uncertainty Adaptations

SimpleAdaptation Example: C+U TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK]We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Same TUTOR2 subdialogue if student was I+U or I+nonU

TUTOR2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK]The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [LONG BOTTOM OUT] ComplexAdaptation Example: C+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: Zero?? [I+U] TUTOR2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK]The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] ComplexAdaptation Example: I+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of gravity. [I+nonU] TUTOR2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK]We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] ComplexAdaptation Example: I+nonU

Experiment 1: ITSPOKE-WOZ Wizard of Oz version of ITSPOKE Human recognizes speech, annotates correctness and uncertainty Provides upper-bound language performance 4 Conditions Simple Adaptation: used same response for all impasses Complex Adaptation: used different responses for each impasse Normal Control: used original system (no adaptation) Random Control: gave Simple Adaptation to random 20% of correct answers (to control for additional tutoring) Prediction: Complex Adaptation > Simple Adaptation > Random Control > Normal Control (for increasing learning) Procedure: reading, pretest, 5 problems, survey, posttest

ITSPOKE-WOZ Screenshot

Results I: Learning F(3, 77) = 3.275, p = 0.02

Results I: Learning F(3, 77) = 3.275, p = 0.02 • Simple Adaptationyields more student learning than NormalControl (original ITSPOKE)[Forbes-Riley and Litman 2010]

Results I: Learning F(3, 77) = 3.275, p = 0.02 • Simple Adaptationyields more student learning than NormalControl (original ITSPOKE)[Forbes-Riley and Litman 2010] • Similar resultsfor learning efficiency[Forbes-Riley and Litman 2009]

Discussion Predictions versus results: Complex Adaptation > Simple Adaptation > Random Control > Normal Control Why didn’t SimpleAdaptation and ComplexAdaptation outperform Random Control? RandomControl adapted to some C+U, diminishing differences Adapting to C+nonU may increase certainty Why didn’t ComplexAdaptation outperform Simple Adaptation? Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

Additional Evaluations - Metacognition • Do metacognitive performance measures differ across experimental conditions? • Impasse Severity [Forbes-Riley et al. 2008] • Monitoring Accuracy[Nietfield et al. 2006] • Bias and Discrimination [Kelemen et al. 2000; Saadawi et al. 2009]

Impasse Severity • Use the scalar value associated with each student turn to compute an average impasse severity, per student Nominal State: I+nonU I+U C+U C+nonU Scalar State: 3 2 1 0 Severity: most less least none

Monitoring Accuracy • The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option • motivated by Feeling of (Another’s) Knowing [Smith and Clark 1993; Brennan and Williams 1995] which is closely related to uncertainty [Dijkstra et al. 2006] • The array is then used to compute monitoring accuracy

Monitoring Accuracy • Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

Additional Results I • Bothcomplex andsimple reduced average impasse severity, compared to normal (p < .08 in paired contrasts) [Litman and Forbes-Riley 2009]

Additional Results I • Simple (and random)increased monitoring accuracy, compared to normal (p < .06 in paired contrasts) [Litman and Forbes-Riley 2009]

Additional Evaluations - (Meta)cognition • Dometacognitive and cognitiveperformance measures (i.e. learning) correlate?

Adapting to User Affect in a Spoken Dialogue System

Adapting to User Affect in a Spoken Dialogue System

Presentation Transcript

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems: System Overview

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue System Architecture

Hidden Information State System A Statistical Spoken Dialogue System

HIGGINS Error handling strategies in a spoken dialogue system

Spoken Dialogue Systems A Tutorial

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems

System-user dialogue

SpeechBuilder: Facilitating Spoken Dialogue System Creation

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System

Clarification in Spoken Dialogue Systems : Modeling User Behaviors

Spoken Dialogue Systems

TARGETED HELP FOR SPOKEN DIALOGUE SYSTEM

User Simulation for Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems