1 / 14

Collection of multimodal data Face – Speech – Body

Collection of multimodal data Face – Speech – Body. George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU. Overview. Objectives Scenario Equipment specifications Subjects & Procedure Visual aspects Acoustic aspects Future processing Please try this at home…. Objectives.

akiva
Download Presentation

Collection of multimodal data Face – Speech – Body

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collection of multimodal dataFace – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU

  2. Overview • Objectives • Scenario • Equipment specifications • Subjects & Procedure • Visual aspects • Acoustic aspects • Future processing • Please try this at home…

  3. Objectives • Collection of emotional multimodal data • Process different modalities • Holy Grail: “EMOTION RECOGNITION”

  4. Scenario • Inspired by GEMEP corpus • Pseudo-language sentence(“Toko”, damato ma gali sa) • Standing body posture • 10 subjects • 8 emotions uniformly distributed through the quadrants (2D emotion theory, valence-arousal) • 3 repetitions of emotion specific gesture • 3 repetitions of emotion independent gesture

  5. Emotion specific gestures

  6. Equipment specifications • 2 DV cameras • Full body • Face • Wireless microphone (shirt-mounted) • PC + External sound card • Uniform dark background • 2 artificial light sources • Light coloured, long sleeves shirt ;)

  7. Subjects & Procedure • Subjects • 10 “actors” • 6 males • 4 females • despair, hot anger, irritation sadness, interest, pleasure, joy, pride Procedure • Subject instructions • Clap before every execution: synchronize streams

  8. Video quality issues • Highest possible resolution • Progressive video (not interlaced) • Correct exposure • Good color quality • No compression artifacts • Uniform lighting

  9. Interlacing / Over-exposure • Interlacing / De-Interlacing • Over-exposure • 70% zebra pattern • Prefer lower-exposure so signal will not be clipped

  10. Colour/Lighting • Medium Y/C Resolution • Compression Artifacts • Exposure • Good Video quality • Source: DV

  11. Archiving PAL: 720x576 @ 25 frames/second • DV Format: ~36Mbit/sec • ~16 GBytes/hour • MPEG2 @ 4-8Mbit/sec (DVD quality) • ~1.8-3.5 GB/hour • MPEG-1 @ 1.1 Mbit/sec • ~500MBytes/hour

  12. Visual Aspects Summary • Video Camera • DV or Better • Progressive Scan Capability • Over-Exposure Indication, Zebra Patterns • Shooting • Use the zebra patterns at 70% • Zoom in as much as possible to increase subject’s resolution • Facial features must be visible for facial analysis • Try to avoid occlusions (hair, glasses, clothes, hand movement) • Uniform lighting conditions • Archive DV tapes, DV Video or Frames, (not MPEG-1)

  13. Acoustic aspects • Why: “Toko, damato ma gali sa”? • Toko: solicitation by naming the interlocutor • Vowels found in majority of language • Meaning: Toko, can you open it? (request) for maintaining semantic aspect • Sampling frequency 44.1 kHz • 16 bits mono information depth • Uncompressed .wav files

  14. Future processing • Process different modalities • Facial feature extraction • Gesture expressiveness analysis • Acoustic analysis • Gesture recognition • Synchronization • Modalities fusion • RNN • RSOM + Markov • SVM • … • Emotion recognition

More Related