320 likes | 451 Views
Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009. Emotion Tagging – A Comparative Study on Bengali and English Blogs. Outline. Motivation Resources Word Level Tagging - Baseline Model - Morphology
E N D
Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009 Emotion Tagging – A Comparative Study on Bengali and English Blogs ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Motivation (1/3) In psychology and common use, emotion is an aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008) ICON 2009
Motivation (2/3) Natural Language Processing (NLP) tasks - Tracking users’ emotion (products, events, politics) - Customer relationship management - Question Answering (QA) systems - Modern Information Retrieval (IR) systems ICON 2009
Motivation (3/3) Blogs - Communicative and informative repository of text based emotional contents in the Web 2.0. (Lin et al., 2007) - Online diary of the bloggers - Blog posts annotated by other bloggers - Large data suitable for machine learning Recognition of emotion from written text ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Resources (1/4) Bengali Blog - Web blog archive (www.amarblog.com) - 14 different comic related topics and user comments - 1200 sentences English blog - Saima Aman and Stan Szpakowicz.2007. Identifying Expressions of Emotion in Text. V. Matoušek and P. Mautner (Eds.): TSD 2007, LNAI 4629, pp. 196–205 - 1200 sentences ICON 2009
Resources (2/4) English Sentiment Lexicon - SentiWordNet (Esuli et al., 2006) - WordNet Affect lists(WAL)(Strapparava et al., 2004) Updating of WAL - Inadequate number of emotion word entries - Retrieved synsets from English SentiWordNet - Update with synsets ICON 2009
Resources (3/4) No Sentiment lexicon in Bengali Both SentiWordNet and WordNet Affect lists into Bengali Translation - Using Bengali synsets (English to Bengali bilingual synset dictionary being developed as part of the English to Indian Languages Machine Translation (EILMT) project, a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India WAL (termed as Emotion List) ICON 2009
Resources (4/4) A knowledge base for Emoticons ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Word Level Tagging Semi-automatic annotation Emotion tag to a word with help of the Emotion list Other non-emotional words tagged with neutral type Stemming process Verified by linguists 700 sentences for training , 300 and 200 sentences as development and test set ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Baseline Model Identify word level emotion tagging accuracies for each emotion class All words incorporate no prior knowledge regarding word features Six separate modules for six emotion classes Words passed through six separate modules Tag each word with the emotion tag based on the emotion class in which that word appears ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Morphology Minimize errors to recognize emotional words Bengali, like any other Indian languages, is morphologically very rich Different suffixes (e.g. verbs, the features are Tense, Aspect, and Person) Stemmer uses suffix list to identify the stem form For English, porter stemmer (Porter, 1997) 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test set ICON 2009
Baseline vs. Morphology (Result) ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
CRF based Model (1/4) • 10 active features (Das and Bandyopadhyay, 2009a) • · POS information (adjective, verb, noun, adverb) • · First sentence in a topic • · SentiWordNet emotion word (delight…) • · Reduplication (so-so, good-good..) • · Question words (what, why…) • · Colloquial / Foreign words • · Special punctuation symbols (!,@,?..) • · Quoted sentence ( “you are 2 good man”) • · Sentence Length (>=8,<15) • · Emoticons ( , , ..) • Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations ICON 2009
CRF based Model (2/4) Feature Analysis - Frequencies - Combination of multiple features vs. single feature - Feature with passive role (e.g. First sentence in a topic) (specific phenomenon for English blog corpus) but active for Topic or user comments or title sentences of Bengali blog - Special punctuation symbols (!,@,? Etc.), their frequencies and attachments obtain 3% and 6% improvement for Bengali and English - Length of a sentence (> eight and < fifteen words per sentence) - Added each feature if its inclusion along with the pre-selected features improves accuracy - Accuracy improvement of 20.83% for Bengali and 24.33% for English over baseline model ICON 2009
CRF based Model (3/4) ICON 2009
CRF based Model (4/4) ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Sentence Level Tagging (1/2) • Sense_Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet for each synset in which each of the seed words appears - Average retrieved score is fixed as Sense_Tag_Weight (STW) of that particular emotion tag ICON 2009
Sentence Level Tagging (2/2) • Sense_Weight_Score (SWS) for each emotion tag - SWSi=(STWi*Ni)/(∑j=1 to 7 STWj*Nj) | i Єj - SWSi is the Sentence level Sense_Weight_Score for the emotion tag i - Ni is the number of occurrences of that emotion tag in the sentence - Sentence level emotion tag SET = [maxi=1 to 7(SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSi produced zero (0) emotion score • Post-processing for handling negative words (Das and Bandyopadhyay, 2009b) ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Evaluation (1/2) • Accuracies - By counting number of sentences whose system assigned emotion tag match with the emotion tag corresponding to its emotion class ICON 2009
Evaluation (2/2) • Loss in accuracies - Frequent use of metaphoric words in blogs • Bengali blogs collected from comic articles Emotions such as “happy”, “sad”, and “surprise” are present with sufficient number in the blog corpus Presence of adequate number of training examples for a particular emotion tag improves accuracy of that tag ICON 2009
Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009
Conclusion • Handling of metaphors • Phrase level analysis concerning genre of corpus • Document level emotion identification • More emotion annotated data - To improve the performance - Suitable for machine learning approach ICON 2009
Thank you ICON 2009
Questions ? ICON 2009