480 likes | 494 Views
Adapting to Multiple Affective States in Spoken Dialogue Kate Forbes-Riley and Diane Litman University of Pittsburgh Pittsburgh, PA USA. Outline. Background ITSPOKE (Intelligent Tutorial Spoken Dialogue System) User Study Experimental Results Conclusions & Current Directions.
E N D
Adapting to Multiple Affective States in Spoken Dialogue Kate Forbes-Riley and Diane Litman University of Pittsburgh Pittsburgh, PA USA
Outline • Background • ITSPOKE (Intelligent Tutorial Spoken Dialogue System) • User Study • Experimental Results • Conclusions & Current Directions
Background • Affective Spoken Dialogue Systemshave clear benefits • Task Success (Forbes-Riley & Litman 2011; Wang et al. 2008) • Rapport/User Satisfaction (Acosta & Ward 2011; Liu & Picard 2005; Klein et al. 2002) • Most research focuses on detection of user affect, or system adaptation to a single state • Fewer systems adapt to multiple states (Acosta & Ward 2011; D’Mello et al. 2010; Aist et al. 2002) • No benefits yet for task success • Typically compared to non-affective systems
This Paper • Two (Wizard-of-Oz) Affective Spoken Dialogue Systems • UNC ITSPOKE: adapts to user uncertainty • UNC+DISE ITSPOKE: adapts to uncertainty & disengagement • Why uncertainty and disengagement? (Pon-Barry & Shieber 2011; Paek & Ju 2008; Craig et al. 2004; Wang & Hirschberg 2011; Schuller et al. 2010; Bohus & Horvitz 2009) • Multiple Evaluation Dimensions • Learning, User Satisfaction, Motivation • Global & Local Impacts, Correlations with Task Success
Disengagement Example (annotations performed by human wizard) • User sings answer indicating lack of interest in its purpose ITSPOKE: What vertical force is always exerted on an object near the surface of the earth? USER:Gravity (disengaged, correct, certain)
Disengagement and Uncertainty Example • User gives up immediately when too much prior knowledge required ITSPOKE: What does Newton's Third Law say? USER:I have no idea (disengaged, incorrect, uncertain)
Outline • Background • ITSPOKE (Intelligent Tutorial Spoken Dialogue System) • User Study • Experimental Results • Conclusions & Current Directions
Original (Non-Affective) System ITSPOKE (Intelligent Tutoring SpokenDialogue System) Back-end: text-based Why2-Atlas (VanLehn, Jordan, Rosé et al., 2002) Tutors qualitative physics Dialogue Format: Question – Student Answer – Response Response Types: to Corrects: positive feedback (e.g. “Fine”) to Incorrects: negative feedback (e.g. “Well…”) and Bottom Out: correct answer with reasoning (easier) Subdialogue: questions walk through reasoning (harder)
UNC ITSPOKE (Control System) • Adapts to uncertainty (UNC), over and above correctness • Gives uncertain+correct answers extra remediation • Same remediation given if answer incorrect, except feedback is positive (e.g., “Right”) • Improves learning (compared to non-affective ITSPOKE) in both wizard-of-oz and fully automated conditions (Forbes-Riley & Litman 2011a,b)
Dialogue with UNC ITSPOKE System: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? User: The force of the car hitting it?? [Correct+Uncertain] System:Fine. [FEEDBACK]We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]
UNC+DISE (Experimental System) • Adapts to uncertainty (UNC) and disengagement (DISE) • After disengaged answers, system gives motivational feedback, progress reports, or easier Fill-In-Blank versions of questions • Goal is to renew engagement and trigger learning of any missed concepts • Similar strategies have increased task success, engagement, and user satisfaction in intelligent tutoring systems (Arroyo et al. 2007; Walonoski & Heffernan 2006; Baker at al. 2006)
Dialogue with UNC+DISE ITSPOKE System: Let’s begin by looking at the motion of the man and his keys while he’s holding them. How does his velocity compare to that of his keys? User: Same same same [Correct+Certain+Disengaged] System:Right. Here’s a quick progress update. You might keep in mind that these concepts will reappear in more complex forms.[FEEDBACK]Now let’s see what happens after the man releases his keys…
Outline • Background • ITSPOKE (Intelligent Tutorial Spoken Dialogue System) • User Study • Experimental Results • Conclusions & Current Directions
User Study • Subjects: College students with no college-level physics • Conditions: UNC versus UNC+DISE ITSPOKE • Procedure • Short physics reading • Physics and motivation pretests • 5 physics problems (1 per dialogue) • version of ITSPOKE varied with condition • Physics and motivation postests; user satisfaction survey • 6th physics problem (isomorphic to 5th) with UNC ITSPOKE
Resulting Corpus • 19 subjects per condition • 228 dialogues (6 per subject) • 3518 user turns, labelled by wizard during experiment
Learning Isomorphic pre and posttests of 26 multiple choice questions Normalized gain: (posttest-pretest)/(1-pretest) Motivation Isomorphic pre and posttests of 19 statements on a 7-point Likert-scale (Ward 2010; Roll 2009; Pintrich and Degroot 1990) Raw gain: (posttest-pretest) User Satisfaction 40 statements on a 5-point Likert scale (Dzikovska et al. 2011) Percent user satisfaction: (user score)/(maximum possible score) Evaluation
Hypotheses: UNC+DISE will outperform UNC ITSPOKE • H1 (Global Performance): Increased learning, motivation, and user satisfaction as measured by posttests after first 5 dialogues • H2 (Test Dialogue): Reduced uncertainty, disengagement, and incorrectness for questions repeated from 5th to 6th dialogue • H3 (Negative Correlations): Fewer negative correlations between amount of disengagement and learning/user satisfaction • H4 (Local Transitions): Less likely for users to remain disengaged in two consecutive dialogue turns
Outline • Background • ITSPOKE (Intelligent Tutorial Spoken Dialogue System) • User Study • Experimental Results • Conclusions & Current Directions
Global Performance Evaluation • H1: UNC+DISE ITSPOKE increases learning, motivation, and user satisfaction (compared to UNC ITSPOKE) • No main effects of condition (one-way ANOVA)
Global Performance Evaluation • H1: UNC+DISE ITSPOKE increases learning, motivation, and user satisfaction (compared to UNC ITSPOKE) • Interactioneffect (p < .05): users who most frequently received the disengagement adaptation (%Disengagement Split=High) had higher motivation gain with UNC+DISE ITSPOKE
Global Performance Evaluation • H1: UNC+DISE ITSPOKE increases learning, motivation, and user satisfaction (compared to UNC ITSPOKE) • Interactioneffect (p < .05): users who least frequently received the disengagement adaptation (%Disengagement Split=Low) had lower motivation gain with UNC+DISE ITSPOKE
Differences for Test Dialogue • H2: responding to disengagement will increasecorrectness, as well as reduce uncertainty and disengagement, for questions repeated in the test dialogue
Differences for Test Dialogue • H2: responding to disengagement will increasecorrectness, as well as reduce uncertainty and disengagement, for questions repeated in the test dialogue • UNC+DISE ITSPOKE: Incorrect answers more often become correct
Differences for Test Dialogue • H2: responding to disengagement will increase correctness, as well as reduce uncertainty and disengagement, in the test dialogue • UNC+DISE ITSPOKE: Uncertain answers more often become certain
Differences for Test Dialogue • H2: responding to disengagement will increase correctness, as well as reduce uncertainty and disengagement, in the test dialogue • However: In some cases, engaged users more often become disengaged
“Breaking” Negative Correlations • H3: even though disengagement may still occur, it will no longer negatively correlate with performance
“Breaking” Negative Correlations • H3: even though disengagement may still occur, it will no longer negatively correlate with performance • Learning Gain
“Breaking” Negative Correlations • H3: even though disengagement may still occur, it will no longer negatively correlate with performance • Learning Gain
“Breaking” Negative Correlations • H3: even though disengagement may still occur, it will no longer negatively correlate with performance • User Satisfaction
“Breaking” Negative Correlations • H3: even though disengagement may still occur, it will no longer negatively correlate with performance • User Satisfaction
“Breaking” Negative Correlations • Note that UNC+DISE ITSPOKE doesn’t always reduce disengagement • Suggests that while reducing disengagement might partially explain the broken correlations, the adaptation may also ameliorate the negative impact even when not reduced
Local Disengagement State Transitions • H4: users interacting with UNC+DISE ITSPOKE will be less likely to transition into disengagement states
Local Disengagement State Transitions • H4: users interacting with UNC+DISE ITSPOKE will be less likely to transition into disengagement states • Transition Likelihood (D’Mello et al. 2007): likelihood of transitioning from (dis)engagement state in turn n to (dis)engagement state in turn n+1 • L= 0: n+1 follows n at chance level
Local Disengagement State Transitions • H4: users interacting with UNC+DISE ITSPOKE will be less likely to transition into disengagement states • UNC ITSPOKE: a disengaged user is more likely to remain disengaged (p = .06) • UNC+DISE ITSPOKE: a disengaged user is equally likely to become engaged or disengaged (no difference between L values, p=.14)
Outline • Background • ITSPOKE (Intelligent Tutorial Spoken Dialogue System) • User Study • Experimental Results • Conclusions & Current Directions
Conclusions • Responding to user disengagement over and above uncertainty can improve global and local performance • H1: No main effect, but slight but significant motivation increase for users with high disengagement • H2: Reduces uncertainty and incorrectness (but not disengagement) in test dialogues • H3: Breaks some negative correlations with task success and user satisfaction • H4: Trend for reducing the likelihood of continued disengagement
Current Directions • Repeat experiment with fully-automated versions of ITSPOKE • New DISE (only) ITSPOKE experimental condition
Questions? Further Information? www.cs.pitt.edu/~litman/itspoke.html Thank You!
Why Uncertainty & Disengagement? Focus of speech and language research Uncertainty (Pon-Barry & Shieber 2011; Paek & Ju 2008) Disengagement (Wang & Hirschberg 2011; Schuller et al. 2010; Bohus & Horvitz 2009; ) Related to learning in intelligent tutoring systems Uncertainty (Forbes-Riley & Litman 2011; Craig et al. 2004) Disengagement (Arroyo et al. 2007 ; Baker et al. 2006) Related States Confusion Boredom, lack of interest, gaming
Disengagement Annotation Scheme • DISE (binary turn-level tag) • User answers given without effort or caring, may display irritation or boredom • E.g. fast answers with leaden/sarcastic/playful tone, signs of distraction such as tapping or electronics usage • Full scheme derived from empirical observations of our data and substantial prior work (e.g., Lehman et al. 2008; Porayska-Pomsta et al. 2008; Conati & Maclaren 2009)
Two Affect-Adaptive (Wizard) Versions • UNC: adapts to single affective state • UNC+DISE: adapts to multiple affective states • Uncertainty • Disengagement • Human “wizard” does speech recognition, semantic analysis, uncertainty and disengagement detection
UNC+DISE (Experimental System) • Adapts to uncertainty (UNC) and disengagement (DISE) • Correct+Disengaged: motivational feedback + progress report (graph of correctness so far) (Arroyo et al. 2007; Walonoski & Heffernan 2006) • Incorrect+Disengaged: motivational feedback + easier Fill-In-Blank version of question (Baker et al. 2006) • renew engagement • trigger learning of missed concept
Examples • Correct + Disengaged • System-a: Well done. Here’s a quick progress update. Good effort so far!! [FEEDBACK when progress is improving]Now let’s ... • System-b:Right. Here's a quick progress update. It might help to remember we will build on the topics we're discussing now. [FEEDBACK when progress is not improving]Now let’s see ... • “
Examples • Correct + Disengaged • System-a: Well done. Here’s a quick progress update. Good effort so far!! [FEEDBACK when progress is improving]Now let’s ... • System-b:Right. Here's a quick progress update. It might help to remember we will build on the topics we're discussing now. [FEEDBACK when progress is not improving]Now let’s see ... • “
Examples • Correct + Disengaged • System-a: Well done. Here’s a quick progress update. Good effort so far!! [FEEDBACK when progress is improving] Now let’s ... • System-b:Right. Here's a quick progress update. It might help to remember we will build on the topics we're discussing now. [FEEDBACK when progress is not improving] Now let’s see ... • Incorrect + Disengaged • System: That doesn't sound right. Let's think about this a little more.[FEEDBACK] Since the man is holding his keys, they aren’t moving relative to each other. So their velocities must be WHAT?[SUPPLEMENTARY QUESTION] • “
Tutoring Theory:Uncertainty and Incorrectnessboth signal Learning Impasses (opportunities to better learn concepts (VanLehn et al., 2003)) Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State: I+nonU I+U C+U C+nonU Severity: most less least none Adaptation Hypothesis: ITSPOKE already provides content to resolve I impasses (I+U, I+nonU), but it ignores one type of U impasse (C+U) Performance improvement if ITSPOKE provides additional content to resolve all impasses Uncertainty-Adaptive ITSPOKE