Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of PittsburghPittsburgh, PA USA

Outline • Motivation • Metacognitive Measures • System(s) and Corpus • Evaluation Results • Discussion

Background Speaker uncertainty is of interest in several research communities Human Language Technologies (Liscombe et al. 2005; Dijkstra et al. 2006; Pon-Barry 2008) Psycholinguistics (Brennan & Williams 1995) AI & Education (Tsukahara & Ward 2001; Aist et al. 2002; Craig et al, 2004; Pon-Barry et al. 2006; Forbes-Riley & Litman 2009)

This Paper We show that remediating after student uncertainty has the potential to increase student's metacognitive (and cognitive) abilities Evaluations use a corpus of previously collected dialogues between students and several versions of a Wizard of Oz spoken tutorial dialogue system

Metacognitive Performance • We measure metacognitive performance, in dialogues annotated for student uncertaintyandcorrectness • Impasse severity • Monitoring accuracy (Nietfield et al. 2006) • Bias (Kelemen et al. 2000; Saadawi et al. 2009) • Discrimination (Kelemen et al. 2000; Saadawi et al. 2009) • We then conduct two evaluations • Do the measures differ across experimental conditions? • Do metacognitive and cognitive performance correlate?

Impasse Severity • Tutoring Theory:Uncertainty and Incorrectnessboth signal Learning Impasses(VanLehn et al., 2003) • Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity Nominal State: I+nonU I+U C+U C+nonU Scalar State: 3 2 1 0 Severity: most less least none • This Paper: Use the scalar value associated with each student turn to compute an average impasse severity, per student

Impasse Severity • Tutoring Theory:Uncertainty and Incorrectnessboth signal Learning Impasses(VanLehn et al., 2003) • Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity Nominal State: I+nonU I+U C+U C+nonU Scalar State: 3 2 1 0 Severity: most less least none • This Paper: Use the scalar value associated with each student turn to compute an average impasse severity, per student Not remediated in many systems

Measures from the Metacognitive Performance Literature • The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option • motivated by Feeling of Knowing (FOK) research, which is closely related to uncertainty (Dijkstra et al., 2006) • The array is then used to compute various standard measures

Monitoring Accuracy • Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

Bias • Bias scores greater than and less than zero indicate over-confidence • and under-confidence, with zero indicating best performance

Discrimination • Discrimination scores greater than zero indicate higher metacognitive • performance, in terms of certainty for correct responses and uncertainty • for incorrect responses

Prior Experiment: Does Remediating after Student Uncertainty Impact Learning? • ITSPOKE-WOZ • Speech-enabled Why2-Atlas(VanLehn, Jordan, Rosé et al. 2002), further parameterized to adapt to uncertainty and/or correctness • Speech recognition and correctness/uncertainty annotations performed in real-time by a human "Wizard" • 4Conditions • Normal Control: original system (no adaptation) • Simple Adaptation: same response for all impasses • Complex Adaptation: different responses for each impasse • Random Control: Simple Adaptation to random 20% corrects • Results (Forbes-Riley & Litman 2009) • Learning: main effect (Simple > Normal, Complex)

Normal (non-adaptive) System • ITSPOKE (Intelligent Tutoring Spoken Dialogue System) • Dialogue Format:Question – Student Answer – Response • Response Types: • to Corrects (C): positive feedback (e.g. “Fine”)‏ • to Incorrects (I): negative feedback (e.g. “Well…”) and • Bottom Out: correct answer with reasoning • Subdialogue: questions walk through reasoning

Two Uncertainty Adaptations • Simple Adaptation • Same response for all 3 impasses • Feedback on only (in)correctness • Complex Adaptation • Different responses for the 3 impasses • Feedback on both uncertainty and (in)correctness

SimpleAdaptation Example: C+U TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK]We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] • Same TUTOR2 subdialogue if student was I+U or I+nonU

ComplexAdaptation Example: C+U TUTOR2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK]The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [NEW BOTTOM OUT] • Different TUTOR2 subdialogue if student was I+U or I+nonU

Although our experiment was designed to impact learning, we hypothesized that the experimental conditions might also improve metacognitive performance For each measure: 1-way ANOVA with condition as between-subjects factor Planned comparisions for each pair of conditions complex adaptation > simple adaptation > random control > normal control Evaluation I: Does Remediating after Uncertainty Impact Metacognition?

Results I: Means across Conditions • Bothcomplex andsimple reduced average impasse severity, compared to normal (p < .08 in paired contrasts)

Results I: Means across Conditions • Simple (and random)increased monitoring accuracy, compared to normal (p < .06 in paired contrasts)

Results I: Means across Conditions • No statistically significant differences or trends for bias

Results I: Means across Conditions • Trend for discrimination differences overall (p =.09) • However, contrary to our predictions, complex reduced discrimination ability, compared to random and simple (p < .04 in paired contrasts)

We also hypothesized that better metacognitive abilities would be better from a learning perspective For each measure: Partial Pearson's correlation over all 81 students with posttest score, controlled for pretest score to measure learning gain Evaluation II: Do metacognitive and cognitive performance correlate?

Results II: Significant Partial Correlations with Posttest (after controlling for Pretest) • Average Impasse Severity (where smaller is better) is negatively correlated with learning

Results II: Significant Partial Correlations with Posttest (after controlling for Pretest) • Better monitoring accuracy and discrimination (where higher is better) are positively correlated with learning

Summary • We analyzed metacognitive performance in tutorial dialogue • evaluation uses both new and traditional measures • Remediating after student uncertainty improved metacognitive performance • Impasse Severity • Monitoring Accuracy • Metacognitive performance is correlated with learning • Impasse Severity (negative) • Monitoring Accuracy, Discrimination (positive)

Current and Future Work Use results to inform ITSPOKE modifications that improve both cognitive and metacognitive abilities measure metacognition using Feeling of Knowing ratings Replicate analysis using fully automated ITSPOKE

Questions? Further Information? http://www.cs.pitt.edu/~litman/itspoke.html Thank You!

Background Speaker uncertainty is of interest in several research communities Prosodic/lexical properties (Liscombe et al. 2005; Dijkstra et al., 2006; Pon-Barry 2008) Feeling of (Another's) Knowing (Brennan and Williams, 1995) Learning correlations (Craig et al., 2004; Forbes-Riley et al. 2008) Tutoring evaluations have investigated the impact of responding to student uncertainty, over and above correctness Positive feedback responses improved satisfaction(Tsukahara and Ward, 2001) and persistence(Aist et al., 2002) More substantive responses didn’t improve learning in a fully-automated system (Pon-Barry et al., 2006), but did improve learning, efficiency, and satisfaction under semi-automated conditions (Forbes-Riley and Litman, 2009)

Discussion Why didn’t SimpleAdaptation and ComplexAdaptation outperform Random Control? RandomControl adapted to some C+U, diminishing differences Adapting to C+nonU may increase certainty Why didn’t ComplexAdaptation outperform Simple Adaptation? Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

Current and Future Work Use results to inform ITSPOKE modifications that improve both cognitive and metacognitive abilities Better optimize remediations for each impasse type, particularly those of highest severity Remediate metacognitive abilities, not just domain content (e.g., Roll and Aleven, 2008; Saadawi et al., 2009) measure metacognition using Feeling of Knowing ratings Replicate analysis using recently collected data from fully automated ITSPOKE

Example Dialogue Excerpt TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? TUTOR2: Fine. [FEEDBACK]

Simple Adaptation: For CU, IU, InonU answers: ITSPOKE gives same content with same dialogue act ITSPOKE gives feedback on (in)correctness Complex Adaptation: Depending on if answer is CU, IU, InonU: ITSPOKE gives same content but varies dialogue act Based on human tutor responses significantly associated with CU, IU, InonU answers ITSPOKE gives complex feedback on affect and (in)correctness Based on empathetic computer tutor literature (Wang et al., 2005; Hall et al., 2004; Burleson et al., 2004) Two Uncertainty Adaptations

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: Zero?? [I+U] TUTOR2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK]The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] ComplexAdaptation Example: I+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of gravity. [I+nonU] TUTOR2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK]We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] ComplexAdaptation Example: I+nonU

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty