1 / 34

Probabilistic Combination of Multiple Modalities to Detect Interest

Probabilistic Combination of Multiple Modalities to Detect Interest. Ashish Kapoor, Rosalind W. Picard & Yuri Ivanov* MIT Media Laboratory *Honda Research Institute US. Skills of Emotional Intelligence:. Expressing emotions Recognizing emotions Handling another’s emotions

Download Presentation

Probabilistic Combination of Multiple Modalities to Detect Interest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Combination of Multiple Modalities to Detect Interest Ashish Kapoor, Rosalind W. Picard & Yuri Ivanov* MIT Media Laboratory *Honda Research Institute US

  2. Skills of EmotionalIntelligence: • Expressing emotions • Recognizing emotions • Handling another’s emotions • Regulating emotions \ • Utilizing emotions / (Salovey and Mayer 90, Goleman 95, Picard 97) if “have emotion”

  3. Face Distance Voice Sensing: Posture Gestures, movement, behavior Skin conductivity Pupillary dilation Up-close Respiration, heart rate, pulse Sensing: Temperature Blood pressure Internal Hormones Sensing: Neurotransmitters … Emotions give rise to changes that can be sensed

  4. “ Emotion recognition” • Detecting Interest • Postures, (Mota, 2002) • Detecting Stress • Physiology, heart-rate (Qi & Picard, 2002) • Detecting Frustration • Pressure Sensors on Mouse (Reynolds, Qi and Picard, PUI 2001)

  5. Example: On Task

  6. Example: Off-Task

  7. “ Emotion recognition” • Advantages: • Robust Affect Recognition • More Information leads to more reliable recognition of affect. • Some modalities are good for certain emotions and not good for other • For example skin conductivity can distinguish between excitement levels but not valence. • In case one modality fails we have other modalities to infer about the affective state

  8. Previous Work • Ensemble Methods • Decision Level Fusion • Kittler et al. PAMI, 1998 • Critic-based Fusion • Miller and Yan, Trans on Signal Processing, 1999 • Boosting and Bagging

  9. Previous Work • Multimodal Recognition of Affect • Huang et al, 1998 • Other Applications • Biometrics, Hong and Jain, PAMI 1998 • Computer Vision, Toyama & Horvitz, ACCV 2000 • Text Classification, Bennett et al, 2002

  10. Problems in Multimodal Combination • No “best” rule that works for all the problems • Rule Based: Product rule • Independence Assumptions about classifiers • Might not hold • Very sensitive to errors • Rule Based: Sum Rule • Approximation to the product rule • Might work where product rule fails

  11. : What we are ultimately interested in!! Using multiple modalities • Aim: • Given multimodal data • Find out the affective state • Affective state denoted by: • for example can represent anger/ stress etc.

  12. Graphical Models for Fusion • Generative Model Paradigm

  13. Graphical Models for Fusion • Assuming Conditional Independence Product Rule!!

  14. Graphical Models for Fusion • A Switching Variable

  15. Graphical Models for Fusion • If Sum Rule!!

  16. Graphical Models for Fusion • Additionally, If we replace ‘+’ • with ‘max’ Max Rule!!

  17. Graphical Models for Fusion Performance Based Averaging!!

  18. Graphical Models for Fusion Critic Based Averaging!!

  19. Graphical Models for Fusion

  20. Graphical Models for Fusion

  21. Classifiers on individual channel Trained using results of classifier on training data Based on Confusion Matrix Model in this work • Learning: • Unsupervised (EM) • Supervised

  22. Training and Testing Data • Scenario: • A child solving a puzzle for 20 min. • Puzzle: • Fripple place: Constraints satisfaction problem. • Sensory data recorded: • Video of face • Posture information • Full recording of the moves made by the child to solve the puzzle • Database consists of about 8 children in the same scenario.

  23. Multiple Modalities: • Face (Manually Encoded) • Upper Face • Eyebrow Raises/Frowns (AU 1, 2 & 4) • Eye Widening/Narrowing(AU 5, 6 & 7) • Postures (Automatically from the chair) • Leaning Forward/ Slumped back etc. • Activity on Chair (High, Medium & Low) • Game Status (Manually Encoded) • Level of Difficulty • Action performed (Game start/ end/ asked for hint etc.)

  24. Sitting Upright Slumped Back Leaning Sideways Leaning Forward Tracking the State: Posture • Two sensor sheets array of 42-by-48 sensing units. • Each unit outputs an 8-bit pressure reading. • Sampling frequency of 50hz

  25. Posture Classification Posture Features Posture Classification using a multi-layer NN Modeling using Gaussian Mixtures Sensory Input

  26. Fusing Everything HMM based Classifier AU 1 Human Coder HMM based Classifier Face Video AU 7 Posture HMM based Classifier Mixture Model & Neural Network Combine Activity HMM based Classifier Posture Sensor Output HMM based Classifier Game Status Human Coder Hint Button Game Level HMM based Classifier Room Constraints Game Information Fripples

  27. Experimental Evaluation • Database, 8 children • All channels available for 4 children • Only posture & game channels available for rest • Three classes: • High Interest (98), Low Interest(94), Refreshing(70) • 60% Training Data, 40% Testing Data • Recognition Accuracy Averaged over 50 runs

  28. Results: Individual Channels Posture Face Game

  29. Experimental Evaluation • Reduction in error for round k, combination method a: • Average Reduction in error:

  30. Results: Combining Channels

  31. Limitations • Conditional Independence Assumption is Invalid • For example AU1 and AU2 are highly correlated • Too much manual intervention • Training Requires Large Amount of Data

  32. Summary • Multiple modalities are useful for robust recognition of affect. • Graphical Models for sensor fusion • Interest detection using multiple modalities

  33. Future Work • Look at the pixel level relationships in video images of face (rather than AUs) • Semi-supervised learning using GP • Accuracy over 80% • Extend the framework • unsupervised learning (EM) • Bayesian Inference (Expectation Propagation) • Learning with human in the loop

  34. Acknowledgements • John Hershey, Selene Mota & Nancy Alvarado • Affective Computing Group, MIT Media Lab • National Science Foundation • This material is based upon work supported by the National Science Foundation under Grant No. 0087768. • Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

More Related