10 likes | 154 Views
Introduction
E N D
Introduction Multimodal Fusion is a technique in which two or more inputs are combined together in order to improve classification accuracy on a particular problem. In this study, we aimed to improve the classification accuracy of existing systems via fusion. We took two existing pieces of software, one audio and one visual, and worked to combine them together using decision level fusion. We conducted experiments to see how we could make the two individual systems compliment each other in order to achieve the highest possible accuracy. • Manual Rules • Created rules to modify EmoVoice output based on • EmoVoice bias towards negative and active voice • PCA weaknesses • Rules classified by training instance class attribute • Happy: If the EMV confidence levels of content and happy voice outweighed all other confidence levels, change to Happy • Neutral: If all confidence levels were within 0.05 of each other, or if neutral confidence was tied for first, change instance to Neutral • Sad: If second to angry within 0.05, change instance to Sad • Emotion Software • Audio Software: EmoVoice (EMV) • Open source, real time • Naïve Bayes classifier • Accuracy: 38.43% • Visual Software: Partial Component Analysis (PCA) • Created by Professor Shane Cotter • Works on still images of faces • Accuracy: 77.4% Senior Project – Computer Science – 2013Multimodal Emotion RecognitionColin GrubbAdvisor – Prof. Nick Webb • Gathering Data • Four emotional states • Angry • Happy • Neutral • Sad • List of sentences read to EmoVoice • Normal visual data and long range visual data (6 ft.) • Datasets constructed using outputs from unimodal systems Experimentation • EmoVoice data modified to complement PCA weaknesses and combat negative/active voice bias • J48 decision tree (C.45) used as classifier Results System Layout • Four experiments run: • Regular Distance • Long Distance • Regular Distance – No Conf. • Long Distance – No Conf (Results were statistically significant with p = 0.05) Conclusion and Future Work We were able to achieve higher classification accuracy via combining audio and visual data and then applying manual bias in order to handle emotions where classification accuracy was weak for the individual systems. Future work will include the automation of individual system components, an online classifier where the output will be returned in real time, and refining the manual rules used to counteract bias. There is also potential for the system to be mounted on a robot currently residing in the department Acknowledgements Prof. Nick Webb Prof. Shane Cotter Prof. Aaron Cass Thomas Yanuklis