Computational Extraction of Social and Interactional Meaning from Speech

Computational Extraction of Social and Interactional Meaning from Speech Dan Jurafsky and Mari Ostendorf Lecture 7: Dialog Acts & Sarcasm Mari Ostendorf Note: Uncredited examples are from Dialogue & Conversational Agents chapter.

H-H Conversation Dynamics (from Stolcke et al., CL 2000) (from Jurafsky book)

Human-Computer Dialog Greeting Request Clarification Question Inform Response I wanna go from Denver to ... Welcome to the Communicator... I’d like to leave in the morning ... What time do you want to leave Denver? Eight flight options were returned. Option 1...

Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm In speech In text

Overview Dialog acts Definitions Important special cases Detection Role of prosody Sarcasm

Speech/Dialog/Conversation Acts Characterize the purpose of an utterance Associated with sentences (or intonational phrases) Used for: Determining and controlling the “state” of a conversation in a spoken language system Conversation analysis, e.g. extracting social information Many different tag sets, depending on application

Example Dialog Acts

Aside: Speech vs. Text Speech/dialog/conversation act inventories were developed when conversations were spoken Now, conversations can happen online or via text messaging Dialog acts are also relevant here, researchers are starting to look at this Some differences: Text is impoverished relative to speech, so extra punctuation, emoticons, etc., are added Turn-taking & grounding

Special Cases Question detection  punctuation prediction 4-category general set: statement, question, incomplete, backchannel  cross-domain training and transfer Agreement vs. disagreement  social analysis Error corrections (for communication errors)  human-computer dialogs

Questions: Harder than you’d think… Indirect speech act

Correction Example

Automatic Detection Two problems: Classification given segmentation Segmentation (often multiple DAs per turn) Best treated jointly, but this can be computationally complex – start with known segmentation case ok uh let me pull up your profile and I’ll be right with you here and you said you wanted to travel next week ? • ok • uh let me pull up your profile and • I’ll be right with you here and • you said you wanted to travel • next week • ok uh let me pull up your profile and I’ll be right with you here • and you said you wanted to travel next week

Looking at Segmentation (from Stolcke et al., CL 2000)

More Segmentation Challenges A:Ok,sowhat do youthink? B: Well that’s a pretty loaded topic. A: Absolutely. B:Well, here in uh – Hang on just a minute, the dog is barking -- Ok, here in Oklahoma, we just went through a major educational reform… A: After all these things, he raises hundreds of millions of dollars. I mean uh the fella B: but he never stops talking about it. A: but ok B: Aren’t you supposed to y- I mean A: well that’s a little- the Lord says B: Does charity mean something if you’re constantly using it as a cudgel to beat your enemies over the- I’m better than you. I give money to charity. A: Well look, now I…

Knowledge Sources for Classification Words and grammar “please,” “would you” – cue to request Aux inversion – cue to Y/N question “uh-huh,” “yeah” – often backchannels Prosody Rising final pitch – Y/N question, declarative question Pitch & energy can distinguish backchannel (yeah) from agreement, pitch reset may indicate incomplete Pitch accent type… (more on this) Conversational structure (context) Answers follow questions

Feature extraction Words N-grams as features DA-dependent n-gram language model score Presence/absence of syntactic constituents Prosody (typically with normalization) Speaking rate Mean and variance of log energy Fundamental frequency: mean, variance, overall contour trend, utterance final contour shape, change in mean across utterance boundaries

Combining Cues with Context With conversational structure: need a sequence model d = dialog act sequence d1, …, dT f = prosody features, w = word/grammar features Direct model (e.g. conditional random field) Generative model (e.g. HMM, or hidden event model) Experimental results show small gain from context argmax p(d|f,w) where p(d|f,w) = Pt p(dt|ft,wt,dt-1) argmax p(f,w|d)p(d) where p(f,w|d) = Pt p(ft|dt) p(wt|dt) p(dt|dt-1)

Assuming Independent Segments No sequence model, but DA prior (unigram) still important Direct model: features can extend beyond utterance to approximately capture context, need to handle nonhomogeneous cues or make them homogeneous Generative model: Can predict dt using separate w and f classifiers, then do classifier combination argmax p(dt|ft,wt) argmax p(ft|dt) p(wt|dt) p(dt)

Some Results(not directly comparable) 42 classes (Stolcke et al., CL 2000) Hidden-event model: prosody & words (& context) 42-class accuracy: 62-65% Switchboard ASR (68-71% hand transcripts) 4 classes (Margolis et al., DANLP 2009) Liblinear, n-grams + length (no prosody), hand transcripts 4-class accuracy: 89% Swbd, 84% MRDA 4-class avg recall: 85% Swbd, 81% MRDA 2 classes (Margolis & Ostendorf, ACL 2011) Liblinear, n-grams + prosody, hand transcripts question F-measure: 0.6 MRDA (recall = 92%) 3 classes (Galley et al., ACL 2004) Maxent, lexical-structural-duration features, hand transcripts 3-class accuracy: 86% MRDA

Backchannel “Universals” What is in common with backchannels across languages? Short length, low energy, NOT the words Example: English: uh-huh, right, yeah Spanish: mmm, si, ya  mmm, yes, already Experiment: Cross-language DA classification for English vs. Spanish conversational telephone speech Margolis et al., 2009 Statement, question, incomplete, backchannel Use automatic translation in cross-language classification

Spanish vs. English DAs • Backchannels: • roughly 20% of DAs • lexical cues are useful within languages, so length is not used much • length more important across languages • Questions: • “<s> es que” often starts a statement in Spanish • translate: “<s> is that” indicates a question in English

Overview Dialog acts Role of prosody Sarcasm

Prosody Impact overall is small: from Stolcke et al., CL 2000 BUT, it can be important for some distinctions Other examples: right, so, absolutely, ok, thank you, …. Oh. (disappointment) vs. Oh! (I get it) Yeah: positive vs. negative

Question Detection From Margolis & Ostendorf, ACL 2011

Whatever! (Benus, Gravano & Hirschberg, 2007) Perception: Listeners negativity judgments from prosody on “whatever” alone is similar to having full context. Production: 1st syllable more likely to have a pitch accent for negative interpretation.

Overview Dialog acts Role of prosody Sarcasm In speech In text

Sarcasm Changing the default (or literal) meaning Objectives of sarcasm Make someone else feel bad or stupid Display anger or annoyance about something Inside joke Why is it interesting? More accurate sentiment detection More accurate agreement/disagreement detection General understanding of communication strategies

Negative positives in talk shows: yeah and i don't think you’re going to be going back … yeah oh yeah that's right yeah yeah yeah but … yeah well i well m my understanding is … yeah it it it gosh you know is that the standard that prosecutors use the maybe possibly she's telling the truth standard yeah i i don't think it was just the radical right yeah larry i i want to correct something randi said of course

Negative positives (cont.) -- right right th that's right that's right yeah you know what you're right but right right but but you you can't say that punching him … right but the but the psychiatrists in this case were not just … senators are not polling very well right then as a columnist who's offering opinions on what i think the right policy is it seems to me…

Yeah, right. (Tepperman et al., 2006) 131 instances of “yeah right” in Switchboard & Fisher, 23% annotated as sarcastic Annotation: In isolation: very low agreement between human listeners (k=0.16)* In context, still weak agreement (k=.31) Gold standard based on discussion Observation: laughter is much more frequent around sarcastic versions * “Prosody alone is not sufficient to discern whether a speaker is being sarcastic.”

Sarcasm Detector Features: Prosody: relative pitch, duration & energy for each word Spectral: class-dependent HMM acoustic model score Context: laughter, gender, pause, Q/A DA, location in utterance Classifier: decision tree (WEKA) Implicit feature selection in tree training

Results • Laughter is most important contextual feature • Energy seems a little more important than pitch

Let’s do our own experiment yeah Male Female absolutely Male Female exactly Male Female

Overview Dialog acts Role of prosody Sarcasm In speech In text Davidov, Tsur & Rappoport, 2010 – DTR10 Gonzalez-Ibanez, Muresan & Wacholder, 2011 – GIMW11

Sarcasm in Twitter & Amazon Twitter examples (DTR10) “thank you Janet Jackson for yet another year of Super Bowl classic rock!” “He’s with his other woman: XBox 360. It’s 4:30 fool. Sure I can sleep through the gunfire” “Wow GPRS data speeds are blazing fast.” More twitter examples (GIMW11) @UserName That must suck. I can't express how much I love shopping on black Friday. @UserName that's what I love about Miami. Attention to detail in preserving historic landmarks of the past. @UserName im just loving the positive vibes out of that! Amazon examples (DTR10) “[I] Love The Cover” (book) “Defective by design” (music player) Negative positive

Twitter #sarcasm issues Problems: DTR10 Used infrequently Used in non-sarcastic cases, e.g. to clarify a previous tweet (it was #Sarcasm) Used when sarcasm is otherwise ambiguous (prosody surrogate?) – biased towards the most difficult cases GIMW11 argues that the non-sarcastic cases are easily filtered by only using ones with #sarcasm at the end

DTR10 Study Data Twitter: 5.9M tweets, unconstrained context Amazon: 66k reviews, known product context Mechanical Turk annotation K= 0.34 on Amazon, K= 0.41 on Twitter Features Patterns of high frequency words + content word slots “[COMPANY] CW does not CW much” Punctuation K-NN classifier Semi-supervised labeling of training samples

DTR10 Results Amazon/Twitter SASI results for eval paradigms Amazon results for different feature sets on gold standard

GMW11 Study Data: 2700 tweets, equal amounts of positive, negative and sarcastic (no neutral) Annotation by hashtags: sarcasm/sarcastic, happy/joy/lucky, sadness/angry/frustrated Features: Unigrams, LIWC classes (grouped), WordNet affect Interjections and punctuation, Emoticons & ToUser Classifier: SVM & logistic regression

Results Automatic system accuracy: 3-way S-P-N: 57%, 2-way S-NS: 65% Equal difficulty in separating sarcastic from positive and negative Human S-P-N labeling: 270 tweet subset, K=0.48 Human “accuracy”: 43% unanimous, 63% avg New humans S-NS labeling, K=.59 Human “accuracy”: 59% unanimous, 67% avg Automatic: 68% Accuracies & agreement go up for subset with emoticons Conclusion: Humans are not so good at this task either…

Summary Dialog Acts Purpose of an utterance in conversation Useful for punctuation in transcription, social analysis, dialog management in human-computer interaction Detection leverages words, grammar, prosody & context Prosody … matters for a small subset of DAs, but can matter a lot for these cases Is realized in continuous (range) and symbolic (accents) cues – needs contextual normalization Sarcasm: a difficult task! (for both text and speech)

Topics not covered … Joint segmentation and classification Semi-supervised learning Domain-dependent tag set differences etc.

Computational Extraction of Social and Interactional Meaning from Speech

Computational Extraction of Social and Interactional Meaning from Speech

Presentation Transcript

Freedom of Speech and Social Responsibility

Computational Social Choice

Computational Extraction of Social and Interactional Meaning from Speech

Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011

Computational Social Science

Social Media and the Meaning of Love

Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011

Computational Social Choice

Extraction and Analysis of Social Networks Datasets

MASTER INTERACTIONAL AND DISCIPLINARY PRACTICES

Overview of Research - Computational Terminology - Knowledge extraction from Text

DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech

Interactional Approach

MEANING AND SCOPE OF SOCIAL RESPONSIBILITY

Computational Social Choice

Meaning of words – meaning of speech acts

Feature Extraction for speech applications

Feature Extraction for speech applications

Human Speech Perception and Feature Extraction

Interactional modifications

Email Extraction from Website & Related Social Media