190 likes | 387 Views
VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard Chollet (ENST) reviewed by Christian Boitet (GETA-CLIPS-IMAG). Content. Background of this work
E N D
VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital LanguageMutsuko Tomokiyo (GETA-CLIPS-IMAG) &Gérard Chollet (ENST)reviewed by Christian Boitet (GETA-CLIPS-IMAG)
Content • Background of this work • Proposal of extension of UNL - Speech to speech MT • Emotion representation
Background • Normalangue - normalization of linguistic resources (2002) TECHNOVOCand RNIL (2002) - normalization of technologies applied in the domain of the engineering of written and spoken language, SIEMENS, TELISMA, IDYLIC, DIALOCA, ELAN Speech, ST Microelectronics, LORIA, ENST Paris • Lingtour (2002) - multilingual-multimedia MT , TsingHua University (China), Paris 8 University (France), INT (France), ENST-Paris and Bretagne (France) and CLIPS (France)
Extension of UNL (1) Example : - May I smoke? - No! You may not, Victor . [S:01] {org:e1} - May I smoke? {/org} {unl} agt(smoke(icl>do).@entry.@present.@may.@interrogative, I) {/unl} [/S] [1]
Extension of UNL (2) [S:02] {org:e2} - No! you may not, Victor [arte] {/org} {unl} agt(smoke(icl>do).@entry.@present.@may.@not, you) mod(smoke(icl>do).@entry.@present.@may.@not, no) mod(no, !(icl>symbol).@interjection) mod(smoke(icl>do).@entry.@present.@may.@not, Victor(icl>name).@vocative) {/unl} [/S]
Speech to Speech Machine Translation (SSMT) 5. Voice, speech, gestures synthesis 1. Speaker recognition 2. Gestures, facial movement and speech recognition 4. Target language generation (Ariane-G5) 3. Transcription and text transfer (UNL) [Furui,03,Blanchon,02]
Emotion representation (1) Classification of emotions : (1) happiness, (2) sadness, (3) disgust, (4) surprise, (5) fear, (6) anger, (7) irritation, (8) hesitation, (9) uncertainty, (10) neutral [morita,89; Ekman,79, 03; OOC,90; ESPIRE, 00]
Emotion representation (2) Emotion eliciting factors and task facets in SSMT: • lexicon (sad, happy, etc) • phatics (ah, hein, etc.) • prosodies (fast, slow, strong, etc.) • voice (noisy, soft, young, etc.) • gestures (movements of hands, mouth, eyes, etc.)
Emotion representation (3) 1 2 3 4 5 6 7 8 9 10 lexicon *** * * * * * * * phatics *** * * * * * * * prosodies *** * * * * * * voice *** * * * * * hands * * mouth * eyes*** * * * eyebrows ** ** head * shoulders** *
Emotion representation (4) Speaker recognition and voice synthesis : • gender, • age, • Variant (natural, artificial, etc.), • voice name (high-pitched, husky, etc.) [BMC,02; W3C rec, 02]
Emotion representation (5) Prosody : • Pitch : x-high, high, medium, low, x-low, default • Range, • Rate : x-fast, fast, medium, slow, x-slow, default • Duration, • Volume : silent, x-soft, soft, medium, loud, x-loud, default • Emphasis, • Break [BMC,02; W3C rec.,02]
Emotion representation (6) Lexicon and Speech acts : Inform, Offer, Offer-follow-up, Promise, Yn-question, Action-request, Confirmation-question, Do-you-understand-question, Permission-request, Wh-question, Yes, No, Acknowledge, Thanks, Thanks-response, Farewell, Good-wishes, Good-wishes-response, Greet, Apology, Apology-response, Alert, Instruct, Confirmation-question-to-self, Invite, Vocative, Topic, Expressive [tomokiyo, 00]
Emotion representation (7) Facial movements : left, right, up, down • mouth • eyes • eyebrows Body movements : left, right, up, down • hands • shoulder • heads [ACE, 02; BMC, 02; MPEG-4, 00]
<?xml version="1.0" encoding="iso-8859-1 ?> <!--<?xml-stylesheettype="text/xsl" href="newshow2.xsl"?> --> <!-- XML for TV --> <!DOCTYPE D (View Source for full doctype...)> <D dn=" TV " on="TV. 1.2" dt="2003"> <Paragraph number="1"> <Sentence :snumber="1"> <org lang="el"> May I smoke?</org> <unlsem> agt(smoke(icl>do).@entry.@present.@may.@interrogative, I) </unlsem> <speech-act>type=”Yn-question” may I smoke ? </speech-act> <prosody>may I <emphasis> smoke</emphasis> ? </prosody> </Sentence > </Paragraph> </D>
<?xml version="1.0" encoding="iso-8859-1 ?> <!--<?xml-stylesheettype="text/xsl" href="newshow2.xsl"?> --> <!-- XML for TV --> <!DOCTYPE D (View Source for full doctype...)> <D dn=" TV " on="TV. 1.2" dt="2003"> <Paragraph number="1"> <Sentence snumber="2"> <org lang="el"> No!, you may not, Victor.</org> <unlsem> aoj(smoke(icl>do).@entry.@present.@may.@not, you) mod(smoke(icl>do).@entry.@present.@may.@not, no) mod(no, !(icl>symbol)) mod(smoke(icl>do).@entry.@present.@may.@not, Victor(icl>name).@vocative) </unlsem> <speech-act> type=”Expressive” No! type=”Inform” you may not, type=”Vocative” Victor </speech-act> <prosody> <emphasis> No!</emphasis> you may <emphasis> not </emphasis> Victor </prosody> <emotion> type =”surprise” lexicon=”No!” eyebrows=”left-and right raised” No! you may not</emotion> </Sentence > </Paragraph> </D>
Reflections and Next step • Extension of UNL –from written text processing to SSMT in multimodality and multilingualism, focussing on emotion representation • Visual corpus development • Development of a prototype with speech and image interface