150 likes | 317 Views
Julia’s Little Helper : A Real-time Demo of Cantonese/Mandarin Emotional Speech Detection. Suzanne Yuen Mechanical Engineering. William Y. Wang Computer Science. CS 6998 Computational Approach to Emotional Speech Instructor: Prof. Julia Hirschberg Columbia University 12/21/2009.
E N D
Julia’s Little Helper: A Real-time Demo of Cantonese/Mandarin Emotional Speech Detection Suzanne Yuen Mechanical Engineering William Y. Wang Computer Science CS 6998 Computational Approach to Emotional Speech Instructor: Prof. Julia Hirschberg Columbia University 12/21/2009
Review • Target Languages: Cantonese (9 tones) , Mandarin (4 tones) • Target Emotions: Anger and Gladness • Lexical Features: ASR using a HMM acoustic model trained on Mandarin Broadcast News [1] and a simple hand-written decoding dictionary. • Prosodic Features: Energy and Tonal Features • Real-time drawing of pitch contour, waveform and energy. • A text-to-speech agent to greet and teach user how to use this demo. • [1] Yang Shao, Lan Wang, E-Seminar: an Audio-guide e-Learning System, IEEE International Workshop on Education Technology and Training (ETT) 2008.
Lexical Scoring 1-3pts Energy 1 pt Tone 1 pt
Dictionary of Affects in Language byDr. Cynthia Whissell Total words: 8742 words were included. Source: It was actually developed using various sources, for example, college student essays, interviews and teenagers description of their own emotion state. So, it can have a broad coverage and avoid biased data.
Sentence Lexical Scoring “I won best paper award!” Score = (2.375 + 2.5556 + 2.5455 + 1.2857 + 2.8333) / 5 = 2.319
Machine Translation Multilingual Challenges: English Chinese
Encoding and Mapping • Tasks: • Mandarin Pinyin (Phone set used by Acoustic Model) • Mandarin Cantonese • Note that not all words in Mandarin have theirs’ exact and direct mappings in Cantonese words and vice versa. • 3. Cantonese Pinyin
Text-to-speech Engine 1. Implement the text-to-speech engine. • “Play with” a text-to-speech engine. • 3. Engine: TruVoice • Lernout & Hauspie Speech Products, or L&H • Went bankrupt in 2001 • technology now owned by Nuance
L&H TTS Functionality • Developed in 1997 • Advanced text pre-processing and no vocabulary restrictions • User-definable pronunciation dictionary • Accurately pronounces surnames and place names • Flexible pitch, volume and speech rate • Intonation support for punctuation
Test Overview • Participants – • gender: 6 male, 6 female • Native Language – 6 Mandarin, 6 Cantonese • Two Parts • JLH module and self-rating (24 lines total) • Perception test – Rating lines from others (72)
Sentences • Three types – questions, exclamations, statements • Randomized order of sentences for each participant • Examples:
Analysis • Plan to examine differences and affects of following: • Ratings - JLH star rating, self rating, & 3 perception ratings • Language – Cantonese, Mandarin • Gender – female, male • Sentence structure – exclamation, question, & statement • Interesting points – • Huge range of Chinese accents • Tones of words may change depending on previous words (such as English a mug vs. an umbrella) • Variations in colloquial speech, addressed by using Chinese script
Future Work • Improve the prosodic analysis. More features should be explored. • Improve the lexical scoring. Use POS tagger or other NLP tools to weigh different constituents of recognized sentence. • Finer-grain the emotion types and investigate the differences. • Study translational divergence in English-Chinese MT .