610 likes | 617 Views
This article explores the use of spoken dialogue in intelligent tutoring systems, discussing its history, potential benefits, and challenges. It also examines performance evaluation, affective reasoning, and discourse analysis in tutoring systems.
E N D
Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development Center University of Pittsburgh Currently Leverhulme Visiting Professor School of Informatics University of Edinburgh
Outline • Motivation and History • The ITSPOKE System and Corpora • Opportunities and Challenges • Performance Evaluation • Affective Reasoning • Discourse Analysis • Summing Up
What is Tutoring? • “A one-on-one dialogue between a teacher and a student for the purpose of helping the student learn something.” [Evens and Michael 2006] • Human Tutoring Excerpt [Thanks to Natalie Person and Lindsay Sears, Rhodes College]
Intelligent Tutoring Systems • Students who receive one-on-one instruction perform as well as the top two percent of students who receive traditional classroom instruction [Bloom 1984] • Unfortunately, providing every student with a personal human tutor is infeasible • Develop computer tutors instead
Tutorial Dialogue Systems • Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001] • Working hypothesis regarding learning gains • Human Dialogue>Computer Dialogue > Text
SpokenTutorial Dialogue Systems • Most human tutoring involves face-to-face spoken interaction, while most computer dialogue tutors are text-based • Can the effectiveness of dialogue tutorial systems be further increased by using spoken interactions?
A Brief History • 1970 – Mid 1980s • SCHOLAR (Carbonell) • WHY (Stevens and Collins) • SOPHIE (Burton and Brown) • Meno-Tutor (Woolf and McDonald) … • Late 1980s - 1990s • CIRCSIM-Tutor (Evens, Michael and Rovick) • SHERLOCK II (Lesgold) • Unix Consultant (Wilensky et al. ) • EDGE (Cawsey) … • Currently… • Why2-AutoTutor (Graesser et al.) (speech synthesis) • Why2-Atlas (VanLehn et al.) • CyclePad (Rose et al.) • Beetle (Moore et al.) • DIAG-NLG (Di Eugenio) • SCoT (Peters et al.) (spoken dialogue) • ITSPOKE (Litman et al.) … (spoken dialogue)
Potential Benefits of Speech: I • Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] • Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? • Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... • Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: I • Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] • Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? • Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... • Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: I • Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] • Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? • Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... • Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: II • Speech contains prosodic information, providing new sources of information about the student for dialogue adaptation[Fox 1993; Litman and Forbes-Riley 2003; Pon-Barry et al. 2005] • A correct but uncertain student turn • ITSPOKE: How does his velocity compare to that of his keys? • STUDENT: his velocity is constant
Potential Benefits of Speech: III • Spoken computational environments may foster social relationships that may enhance learning • AutoTutor [Graesser et al. 2003]
Potential Benefits of Speech: IV • Some applications inherently involve spoken language • Spoken Conversational Interface for Language Learning [MIT(Seneff,Glass,Wang),Cambridge (Young,He,Ye)] • Reading Tutors [Mostow, Cole] • Others require hands-free interaction • Circuit Fix-It Shop [Smith 1992]
Why Should NLP Researchers Care? • Many reasons why tutoring researchers are interested in spoken dialogue • Why should researchers in computational linguistics become interested in tutoring? • Tutoring applications differ in many ways from typical spoken dialogue applications • Opportunities and Challenges!
Outline • Motivation and History • The ITSPOKE System and Corpora • Opportunities and Challenges • Performance Evaluation • Affective Reasoning • Discourse Analysis • Summing Up
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] • Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] • Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] • Sphinx2 speech recognition and Cepstral text-to-speech
Two Types of Tutoring Corpora • Human Tutoring • 14 students / 128 dialogues (physics problems) • 5948 student turns, 5505 tutor turns • Computer Tutoring • ITSPOKE v1 • 20 students / 100 dialogues • 2445 student turns, 2967 tutor turns • ITSPOKE v2 • 57 students / 285 dialogues • both synthesized and pre-recorded tutor voices
ITSPOKE Experimental Procedure • College students without physics • Read a small background document • Took a multiple-choice Pretest • Worked 5 problems (dialogues) with ITSPOKE • Took an isomorphic Posttest • Goal was to optimize Learning Gain • e.g., Posttest – Pretest
Outline • Motivation and History • The ITSPOKE System and Corpora • Opportunities and Challenges • Performance Evaluation • Affective Reasoning • Discourse Analysis • Summing Up
Predictive Performance Modeling • Opportunity • Spoken dialogue system evaluation methodologies can improve our understanding of how dialogue facilitates student learning [Forbes-Riley and Litman 2006] • Challenges • How to measure system performance? • What are predictive interaction parameters?
Predictive Performance Modeling • Understand why a spoken dialogue system fails or succeeds • PARADISE [Walker et al. 1997] • Measureparameters (interaction costs and benefits) and performance in a system corpus • Train model via multiple linear regression over parameters, predicting performance System Performance = ∑ wi * pi • Test model on new corpus • Predict performance during future system design n i=1
Challenges • System Performance • Prior evaluations used User Satisfaction • IsStudent Learning more relevant for the tutoring domain? • Interaction Parameters • Prior applications used Generic parameters • Are Task-Specific and Affective parameters also useful?
Findings • Using PARADISE to predict Learning • Posttest= .86 * Time + .65 * Pretest- .54 * #Neutrals • Traditional predictive parameters • e.g., Elapsed Time, Dialogue and Turn Length • New parameters • e.g., Affect, Correctness • Predictive power increases with the linguistic sophistication of the parameters • e.g., Semantic concepts rather than words
Contrasts with Non-Tutorial Dialogue • User Satisfaction models are less useful • Tutoring systems are not designed to maximize User Satisfaction • Interaction parameters for learning • Posttest = .86 * Time + .65 * Pretest - .54 * #Neutrals • longer dialogues are better • speech recognition problems don’t seem to matter • lack of some types of affect is bad
Contrasts with Non-Tutorial Dialogue • User Satisfaction models are less useful • Tutoring systems are not designed to maximize User Satisfaction • Interaction parameters for learning • Posttest =.86 * Time+ .65 * Pretest - .54 * #Neutrals • longer dialogues are better • speech recognition problems don’t seem to matter • lack of some types of affect is bad
Contrasts with Non-Tutorial Dialogue • User Satisfaction models are less useful • Tutoring systems are not designed to maximize User Satisfaction • Interaction parameters for learning • Posttest = .86 * Time + .65 * Pretest - .54 * #Neutrals • longer dialogues are better • speech recognition problems don’t seem to matter • lack of some types of affect is bad
Outline • Motivation and History • The ITSPOKE System and Corpora • Opportunities and Challenges • Performance Evaluation • Affective Reasoning • Discourse Analysis • Summing Up
Detecting and Responding to Student Affective States • Opportunity • Affective spoken dialogue system technology can improve student learning and other measures of performance [Aist et al. 2002; Pon-Barry et al. 2006] • Challenges • What to detect? • How to respond?
Monitoring Student State (motivation) Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit (ASR: it is) Tutor28 : Could you please repeat that? Student29 : same (ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero (ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again (ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer (ASR: downward you computer)
Affective Spoken Dialogue Systems: Standard Methodology • Manual Annotation of Affect and Attitudes • Naturally-occurring spoken dialogue data [Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003; Liscombe et al. 2005] • Prediction via Machine Learning • Automatically extract features from user turns • Use different feature sets (e.g. prosodic, lexical) to predict affect • Significant reduction of baseline error
Challenge 1: What “emotions” to detect? • Communicator and Customer Care Systems • Negative: Angry, Annoyed, Frustrated, Tired • Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]
Challenge 1: What “emotions” to detect? • Communicator and Customer Care Systems • Negative: Angry, Annoyed, Frustrated, Tired • Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005] • Tutorial Dialogue Systems • Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Contempt, Disgusted, Sad • Positive/Neutral: Certain, Curious, Enthusiastic, Eureka [Litman and Forbes-Riley 2006, D’Mello et al. 2006]
Example Student States in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]
Challenge 2: How to respond? • In tutoring, not all negatively-valenced emotions are bad! Instead, they often represent learning opportunities • Uncertainty represents one type of learning impasse[VanLehn et al. 2003]: An impasse motivates a student to take an active role in constructing a better understanding of the principle. • Uncertainty is also associated with cognitive disequilibrium[Craig et al. 2004]: A state of failed expectations causing deliberation aimed at restoring equilibrium • Uncertainty positively correlates with learning
Example from ITSPOKE ITSPOKE: What is the net force acting on the truck equal to? Student: I don’t know [Uncertain, Frustrated] ITSPOKE: Since net force is defined as the vector sum of all the forces exerted on an object, let's first find the forces acting on the truck. Try to name them. Student: The impact force, the normal force, and gravity [Certain, Neutral] ITSPOKE: Yep.
Do Human Tutors Respond to Student Uncertainty? • A data-driven method for designing dialogue systems adaptive to student state [Forbes-Riley and Litman 2005] • extraction of “dialogue bigrams” from annotated human tutoring corpora • χ2analysis to identify dependent bigrams • generalizable to any domain with corpora labeled for user state and system response
Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]
Bigram Dependency Analysis - “Student Certainness – Tutor Positive Feedback” Bigrams χ2= 225.92 (critical χ2value at p = .001is 16.27)
Bigram Dependency Analysis (cont.) - LessTutor Positive FeedbackafterStudent Neutralturns
Bigram Dependency Analysis (cont.) - LessTutor Positive FeedbackafterStudent Neutralturns - MoreTutor Positive Feedbackafter “Emotional” turns
Findings • Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor • After uncertain, tutorBottoms Out &avoids expansions • After certain, tutor Restates • After mixed, tutorHints • After any emotion, tutor increases Feedback • Dependencies suggest adaptive strategies for implementation in computer tutoring systems • Experiment in progress with adaptive ITSPOKE
Outline • Motivation and History • The ITSPOKE System and Corpora • Opportunities and Challenges • Performance Evaluation • Affective Reasoning • Discourse Analysis • Summing Up
Discourse Structure • Opportunity • Dialogues with tutoring systems have more complex hierarchical discourse structures compared to many other types of dialogues • Challenges • How can discourse structure be exploited in the context of spoken dialogue systems?
Exploiting Discourse Structure (Motivation) • Average ITSPOKE dialogue is 20 minutes • Student turns are hierarchically structured • Level 1 : 1350 (57.3%) • Level 2 : 643 (27.3%) • Level 3 : 248 (10.5%) • Levels 4-6 :113 (4.8%)
Q1 Q2 Q3 Q2.1 Q2.2 Discourse structureAnnotation and Transitions • Based on the Grosz & Sidner theory of discourse structure • Discourse segment Discourse segment purpose • Hierarchy of discourse segments • Tutoring information encoded in a hierarchical structure • Human tutor manually authored dialogue paths for ITSPOKE • Automatic traversal of logs places utterances into the structure
ITSPOKE behavior & Discourse structure annotation Q1 Q2 Q3 Q2.1 Q2.2
Q1 Q2 Q3 Q2.1 Q2.2 Discourse structure transitions
Findings • Student correctness is predictive of student learning, but only after particular discourse transitions [Rotaru and Litman 2006] • e.g., After Pops (PopUp, PopUpAdvance) • incorrect turns negatively predict learning • correct turns positively predict learning • Currently testing with experimental manipulation • Student certainness is more predictive only after particular transitions