420 likes | 599 Views
AutoTutor: Integrating Learning with Emotions. Art Graesser, Bethany McDaniel and Sidney D’Mello. Overview. Project goals (Art) Methods of data collection (Bethany) Affective states during learning (Sidney) Occurrence of affective states Inter-judge agreement
E N D
AutoTutor: Integrating Learning with Emotions Art Graesser, Bethany McDaniel and Sidney D’Mello
Overview • Project goals (Art) • Methods of data collection (Bethany) • Affective states during learning (Sidney) • Occurrence of affective states • Inter-judge agreement • Detection of Ekman’s facial actions (Bethany) • Dialogue acts and emotions (Sidney)
Art Graesser (PI) Zhiqiang Cai Stan Franklin Barry Gholson Max Louwerse Natalie Person Roz Picard (MIT) Vasile Rus Patrick Chipman Scotty Craig Sidney D’Mello Tanner Jackson Brandon King Bethany McDaniel Jeremiah Sullins Kristy Tapp Adam Wanderman Amy Witherspoon Learn by conversation in natural language Subject matters Computer literacy Conceptual physics Critical thinking (now) Tracks and adapts to cognition, emotions, and abilities of learner. Improves learning nearly a letter grade compared to reading textbooks (.8 sigma effect size) AutoTutor Highlights
AutoTutor 1998 (Graesser, Wiemer-Hastings, Wiemer-Hastings, & Kreuz, 1999)
Talking head • Gestures • Synthesized speech • Presentation of the question/problem Student input (answers, comments, questions) • Dialog history with • tutor turns • student turns
Expectations and misconceptions in Sun & Earth problem EXPECTATIONS • The sun exerts a gravitational force on the earth. • The earth exerts a gravitational force on the sun. • The two forces are a third-law pair. • The magnitudes of the two forces are the same. MISCONCEPTIONS • Only the larger object exerts a force. • The force of earth on sun may be less than that of sun on earth.
Expectation and Misconception-Tailored Dialog (Pervasive in AutoTutor & unskilled human tutors) • Tutor asks question that requires explanatory reasoning • Student answers with fragments of information, distributed over multiple turns • Tutor analyzes the fragments of the explanation • Compares to a list of expectations (good sentences) • Compares to a list of misconceptions (bad answers) • Tutor posts goals & performs dialog acts (hints, prompts) to improve explanation • Fills in missing expectations (one at a time) • Corrects expected misconceptions (immediately) • Tutor handles periodic sub-dialogues • Student questions • Student meta-communicative acts (e.g., What did you say?)
Dialog Moves • Positive immediate feedback: “Yeah” “Right!” • Neutral immediate feedback: “Okay” “Uh huh” • Negative immediate feedback: “No” “Not quite” • Pump for more information: “What else?” • Hint: “What about the earth’s gravity?” • Prompt for specific information: “The earth exerts a gravitational force on what?” • Assert: “The earth exerts a gravitational force on the sun.” • Correct: “The smaller object also exerts a force. ” • Repeat: “So, once again, …” • Summarize: “So to recap,…” • Answer student question:
Cycle fleshes out one expectation at a time Exit cycle when: LSA-cosine(S, E ) > T S = student input E = expectation T = threshold Hint-Prompt-Assertion Cycles to Cover One Expectation Hint Prompt Assertion Hint Prompt Assertion
How might AutoTutor be Responsive to the Learner’s Affect States • If learner is frustrated, then AutoTutor gives a hint. • If bored, then some engaging razzle dazzle • If flow/absorbed, then lay low • If confused, then intelligently manage optimal confusion
AutoTutor tracking learners’ emotions (Memphis team & MIT team with Roz Picard) Confusion Excitement Boredom Flow Frustration Eureka
Visual – IBM Blue eyes camera Posture – Body Pressure Measurement System AutoTutor • Pressure – force sensitive mouse and keyboard AutoTutor text dialog
Facial ExpressionsThe IBM Blue Eyes Camera Red Eye Effect IBM Blue Eyes Camera Eyebrow Templates
Gold Standard Study • Session one • Participants (N=28) interact with AutoTutor • Collect data with sensors • BPMS • Blue eyes camera • AutoTutor logs • Participants view their video and give ratings • Session two (one week later) • Participants view another participant’s video and give affect indications every 20 seconds • Expert judges (N=2) give affect ratings • Learning measures - 32 multiple choice questions (pretest-posttest)
Interrater ReliabilityJudges • F(5,135) = 38.94, MSe = .046, p < .01 • Trained Judges > All Raters, • Self- Peer < All Raters • Peer-Judge 1 > Peer-Judge 2 • Peer-Judge 1 > Self-Judge 1
Interrater ReliabilityEmotions F(6,162) = 23.13, MSe = .058, p < .01 Delight > Confusion > Boredom > (Flow = Frustration = Neutral = Surprise)
Interrater ReliabilityTrained Judge Recoding Overall: Kappa = .49 N = 1133
Conclusions • Trained judges who are experienced in coding facial actions provide affective judgments that are more reliable and that match the learner’s self reports than the judgments of untrained peers. • The judgments by peers have very little correspondence to the self reports of learners. • Third, an emotion labeling task is more difficult if judges are asked to make emotion ratings at regularly polled timestamps, when compared to spontaneous judgments. • Different types of emotions are elicited at different judgment points.
Detection of Ekman’s Facial Actions • Facial Action Coding System (Ekman, 2003; Ekman & Friesen, 1978) • Each Action Unit (AU) represents muscular activity which produces a change in facial appearance Neutral AU 1 – Inner Brow Raise AU 7 –Lid Tightener
Discriminability of Action Units • How well each AU can be detected • Motion: How detectable the change is from the Neutral position. • Edge: How clearly defined a line or object on the face is in relation to the surrounding area. • Texture: The level of graininess for the general area. The degree of variation of the intensity of the surface, quantifying properties such as smoothness, coarseness and regularity.
Grading Discriminability • 2 expert judges • trained on the Facial Action Coding System • Rate Motion, Edge, and Texture on a 1-6 scale • 1=Very Difficult, 6=Very Easy • Averaged the 3 scores for each expert • Score between 3-18 • 3=Very Difficult to detect, 18= Very Easy to detect
Probabilities of the occurrence of AU’s across emotions and discriminability
Probabilities of the occurrence of AU’s across emotions and discriminability
Probabilities of the occurrence of AU’s across emotions and discriminability
Probabilities of the occurrence of AU’s across emotions and discriminability
Auto Tutor’s Text Dialog Student answers LSA matches
AUTO TUTOR’S CONVERSATIONAL CHANNELS TEMPORAL VERBOSITY ANS. QUALITY ADVANCER FEEDBACK LOCAL GOOD LOCAL BAD GLOBAL GOOD GLOBAL BAD DEL LCL GOOD DEL LCL BAD DEL GLG GOOD DEL GLG BAD PUMP HINT PROMPT CORRECTION ASSERTION SUMMARY NEGATIVE NEUTRAL NEG NEUTRAL NEUTRAL POS POSITIVE REAL TIME SUBTOPIC TURN RESPONSE WORDS CHARS SPEECH ACT DIRECTNESS FEEDBACK
Multiple Regression Analyses *p < .10 **p < .05
How might AutoTutor be Responsive to the Learner’s Affect States • If learner is frustrated, then AutoTutor gives a hint. • If bored, then some engaging razzle dazzle • If flow/absorbed, then lay low • If confused, then intelligently manage optimal confusion