290 likes | 407 Views
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues. Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation. Introduction. Why is important to detect/handle emotions? Emotion annotation Classification task Previous work.
E N D
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation
Introduction • Why is important to detect/handle emotions? • Emotion annotation • Classification task • Previous work
(Spoken) Tutoring dialogues • Education • Classroom setting • Human (one-on-one) tutoring • Computer tutoring (ITS – Intelligent Tutoring Systems) • Addressing the learning gap between human and computer tutoring • Dialogue-based ITS (Ex: Why2) • Improve the language understanding module of ITS • Incorporate affective reasoning • Connection between learning and student emotional state • Adding human-provided emotional scaffolding to a reading tutor increases student persistence (Aist et al., 2002)
Human-Computer Excerpt Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit(ASR: it is) Tutor28 : Could you please repeat that? Student29 : same(ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero(ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this(ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again(ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer(ASR: downward you computer)
Affective reasoning • Prerequisites • Dialogue-based ITS : Why2 • Interaction via speech : ITSPOKE (Intelligent Tutoring SPOKEn dialogue system) • Affective reasoning • Detect student emotions • Handle student emotions
Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system (VanLehn et al., 2002) • Sphinx2 speech recognition and Cepstral text-to-speech
Student emotions • Emotion annotation • Perceived, intuitive expressions of emotion • Relative to other turns in context and tutoring task • 3 Main emotion classes • Negative - e.g. uncertain, bored, irritated, confused, sad; (question turns) • Positive - e.g. confident, enthusiastic • Neutral - no strong expression of negative or positive emotion; (grounding turns) • Corpora • Human-Human (453 student turns from 10 dialogues) • Human-Computer (333 student turns from 15 dialogues)
Annotation example Tutor: Uh let us talk of one car first. Student: ok. (EMOTION = NEUTRAL) Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward? Student: The engine. (EMOTION = POSITIVE) Tutor: Uh well engine is part of the car, so how can it exert force on itself? Student: um… (EMOTION = NEGATIVE)
Classification task • 3 Levels of Annotation Granularity • NPN - Negative, Positive, Neutral • NnN - Negative, Non-Negative • positives and neutrals are conflated as Non-Negative • EnE - Emotional, Non-Emotional • negatives and positives are conflated as Emotional neutrals are Non-Emotional • useful for triggering system adaptation (HH corpus analysis) • Agreed subset • Predict the class of each student turn
Previous work - Features • Human-Human • 5 feature types • Acoustic-prosodic • amplitude, pitch, duration • Lexical • Other automatic • Manual • Identifiers • Combinations • Current turn • Contextual • Local – previous two turns • Global – all turns so far • Human-Computer • 3 feature types • Acoustic-prosodic • amplitude, pitch, duration • Lexical • Other automatic • Manual • Identifiers • Combinations
Previous work - Results Litman and Forbes, ACL 2004
How to improve? • Use word-level features instead of turn-level features • Extend the pitch features set • Simplified word-level emotion model
Why word-level features? • Emotion might not be expressed over the entire turn • “This is great” Angry Happy
Why word-level features? (2) • Can approximate pitch contour better at sub-turn levels. • Especially for longer turns This is great
Extended pitch features set • Previous work • Min, Max • Avg, Stdev • Extend with • Start, End • Regression coefficient and regression error • Quadratic regression coefficient from Batliner et al. 2003
But wait… Features Machine learning Student turn Turn emotional class 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Turn-level Word-level Word 1 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 ? … … Turn emotional class Word n 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Machine learning 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Sönmez et al., 1998
Word-level emotion model Features Machine learning Student turn Turn emotional class 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Turn-level Word-level Word-level emotion Word 1 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 … … … Turn emotional class Word n Word-level emotion 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755
Word-level emotion model • Training phase • Each word labeled with turn class • Extra features to identify the position of the word in the turn (distance in words from the beginning and end of the turn) • Learn emotion model at the word level • Test phase • Predict each word class based on the learned model • Use majority/weighted voting to label the turn based on its word classes • Ties are broken randomly
Questions to answer • Will word level feature work better than turn level features for emotion prediction? • Yes • If yes, where does the advantage comes from? • Better prediction of longer turns • Is there a feature set that offers robust performance? • Yes. Combination of pitch and lexical features at word level.
Experiments • EnE classification, agreed turns • Two contrasting corpora • Two contrasting learners (WEKA) • IB1 – nearest neighbor classifier • ADA – boosted decision trees
Feature sets • Only pitch and lexical features • 6 sets of features • Turn level: • Lex-Turn – only lexical • Pitch-Turn – only pitch • PitchLex-Turn – lexical and prosodic • Word level: • Lex-Word – only lexical + positional • Pitch-Word – only pitch + positional • PitchLex-Word – lexical and prosodic + positional • Baseline: majority class • 10 x 10 cross validation
Results – IB1 on HH • Word-level features significantly outperform turn-level features • Word-level better than turn-level on longer turns • Best performers: Lex-Word, PitchLex-Word
Results – ADA on HH • Turn-level performance increases a lot • Word-level significantly better than turn-level on features sets with pitch • Word-level better than turn-level on longer turns but the difference is smaller • Best performers: Lex-Turn, Lex-Word, PitchLex-Word
Results – IB1 on HC • Word-level features significantly outperform turn-level features • Lexical information less helpful than on HH corpus • Word-level better than turn-level on longer turns • Best performers: Pitch-Word, PitchLex-Word
Results – ADA on HC • Difference not significant anymore • IB1 better than ADA on word-level features • ADA has bigger variance on this corpus • Word-level better than turn-level on longer turns but the difference is smaller • Best performers: Pitch-Turn, Pitch-Word, PitchLex-Turn, PitchLex-Word
Discussion • Lexical features at turn and word-level are similar • Performance dependent on corpus and learner • Pitch features differ significantly • Word-level better than turn-level (4/6) • PitchLex-Word a consistent best performer • Our best accuracies comparable with previous work
Conclusions & Future work • Word-level better than turn-level for emotion prediction • Even under a very simple word-level emotion model • Word-level better at predicting longer turns • PitchLex-Word a consistent best performer • Future work: • More refined word-level emotion models • HMMs • Co-training • Filter irrelevant words • Use the prosodic information left out • See if our conclusions generalize on detecting student uncertainty • Experiment with other sub-turn units (breath groups)