1 / 24

Acoustic Cues to Emotional Speech

Acoustic Cues to Emotional Speech. Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003. Motivation. A speaker’s emotional state conveys important and potentially useful information

Download Presentation

Acoustic Cues to Emotional Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003

  2. Motivation • A speaker’s emotional state conveys important and potentially useful information • To recognize (e.g. Spoken Dialogue Systems , tutoring systems ) • To generate (e.g. games) • If we know what emotion is and what aspects of productions convey different types • Defining emotion in multidimensional space • Valence: happy vs. sad • Activation: sad vs. despairing

  3. Features that might convey emotion • Acoustic and prosodic • Lexical and syntactic • Facial and gestural

  4. Previous Research • Emotion detection in corpus studies • Batliner, Noeth, et al; Ang et al: anger/frustration in dialogue systems • Lee et al: pos/neg emotion in call center data • Ringel & Hirschberg: voicemail • … in laboratory studies • Forced choice among 10-12 emotion categories • Sometimes with confidence rating

  5. Problems • Hard to identify emotions reliably • Variation in ‘emotional’ utterances: production and perception • How can we obtain better training data? • Easier to detect variation in activation than in valence • Variation in ‘emotional’ utterances • Large space of potential features • Which are necessary and sufficient?

  6. New methods for eliciting judgments • Hypothesis: Utterances in natural speech may evoke multiple emotions • Elicit judgments on multiple scales • Tokens from LDC Emotional Prosody Speech and Transcripts Corpus • Professional actors reading 4-syllable dates and numbers • disgust, panic, anxiety, hot anger, cold anger, despair, sadness, elation, happiness, interest, boredom, shame, pride, contempt, neutrality

  7. Modified category set: • Positive: confident, encouraging, friendly, happy, interested • Negative: angry, anxious, bored, frustrated, sad • Neutral • For study: 1 token of each from each of 4 voices plus practice tokens • Subjects participated over the internet

  8. 40 native speakers of standard American English with no reported hearing impairment • 17 female, 23 male, all 18+ • 4 random orders rotated among subjects

  9. Correlations between Judgments sad ang bor fru anxfri con hap int enc sad .06 .44.26 .22-.27 -.32 -.42 -.32 -.33 angry .05 .70 .21-.41 .02 .37 -.09 -.32 bored .14 -.14-.28 -.17 -.32 -.42 -.27 frustrated .32 -.43 -.09 -.47 -.16 -.39 anxious -.14 -.25 -.17 .07 -.14 friendly .44 .77 .59 .75 confident .45 .51 .53 happy .58 .73 interested .62 encouraging

  10. What acoustic features correlate with which emotion categories? • F0: min, max, mean, ‘range’, stdev • RMS: min, max, mean, range, stdev • Voiced samples/all samples (VCD) • Mean syllable length • TILT: spectral tilt (2-1 harmonic over 30ms window) of highest ampl vowel, nuclear stressed vowel • Type of nuclear accent, contour, phrasal ending

  11. Results • F0, RMS and rate distinguish emotion categories by activation (act) • +act correlate with higher F0 and RMS, faster • do not distinguish valence (val) • Tilt of highest amplitude vowel groups +act emotions with different val into different categories (e.g. friendly, happy, encouraging vs. angry, frustrated) • Phrase accent/boundary tone also separates +val from -val

  12. H-L% positively correlated with -val and negatively with +val • +val positively correlated with L-L% and -val not

  13. Predicting Emotion Categories Automatically • 1760 judgment/token datapoints (90%/10% training/test) • collapse 2-5 ratings to one • Ripper machine learning algorithm • Baseline: choose most frequent ranking • Mean performance over all emotions 75% (22% improvement over baseline) • Individual emotion categories

  14. Happy, encouraging, sad, and anxious predicted well • Confident and interested show little improvement • Which features best predict which emotion categories?

  15. Best Performing Features

  16. Conclusions • New features to distinguish valence: spectral tilt and prosodic endings • New understanding of relations among emotion categories • Judgments • Features

  17. Current/Future Work • Use ML to rank rather than classify (RankBoost) • Eye-tracking task, matching tokens to ‘emotional’ pictures • Web survey to ‘norm’ pictures • Layout issues

More Related