1.07k likes | 1.27k Views
NLP Group at Jadavpur University, Kolkata, India. Computer Science and Engineering Department Teaching Natural Language Processing to students of Undergraduate and Masters’ students in Computer Science and Engineering Laboratory projects for students Research and Development.
E N D
NLP Group at Jadavpur University, Kolkata, India • Computer Science and Engineering Department • Teaching • Natural Language Processing to students of Undergraduate and Masters’ students in Computer Science and Engineering • Laboratory projects for students • Research and Development
Research and Development in NLP • International Projects • "Strategic India-Japan Cooperative Programme-Project" in the area of multidisciplinary ICT,Project Entitled:: "Sentiment Analysis where AI meets Psychology" Research Leader in Japan:: Professor Manabu Okumura, Precision and Intelligence Laboratory; Tokyo Institute of Technology, Japan
Research and Development in NLP • International Projects • "INDO-FRENCH CENTER FOR THE PROMOTION OF ADVANCED RESEARCH (IFCPAR)", Govt. of India and France Project Entitled:: "An advanced platform for question answering systems" Principal Collaborator in France:: Prof Patrick Saint Dizier, Institut de Recherche en Informatique du Toulouse, Toulouse, France
Research and Development in NLP • International Projects • CONACYT-DST India Project Entitled:: "Answer Validation through Textual Entailment". Principal Collaborator in Mexico:: Professor Alexander Gelbukh, Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico
Research and Development in NLP • National Projects (Consortium Mode) • Cross Lingual Information Access • Snippet and Summary Generation • Snippet Translation • English to Indian Languages Machine Translation Systems • Indian Language to Indian Languages Machine Translation Systems
NLP Manpower • Doctoral Students • Statistical Machine Translation • Answer Validation through Textual Entailment (joint supervision with Prof. Alexander Gelbukh) • Opinion Mining • Emotion Analysis • Event Identification and Event – Time Analysis
NLP Manpower • Masters’ Students • Multi Word Expressions • Comparative and Evaluative Question Answering Systems • Undergraduate Students
Emotional Expression, Holder and Topic – The Three Vertices of an Emotion Triangle Prof. Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India
Introduction • (Quan and Ren, 2009) “Opinion Mining and Sentiment Analyses have been attempted with more focused perspectives rather than fine-grained emotion” • Emotion - An aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008)
Introduction • Emotion - A private state, not open to any objective observation or verification (Quirk et al., 1985) - Direct affective word (“He is really happy enough”) - Indirect notion (“Dream of music is in their eyes and hearts”) - Difficult to identify emotional stance in text - Need for Syntactic, Semantic and Pragmatic analysis of text (Polanyi and Zaenen, 2006)
Introduction - Natural language text contains attitudinal information of a reader or writer with respect to some subject, event ortopic - Attitude may be - Judgment - Evaluation - of a Reader - of a Writer • “There is indeed a relationship between writer and reader emotions” (Yang et al., 2009)
Emotion/Sentiment Triangle Expression Where from do we start ? Lexicon and Corpus ! Holder Topic
Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation
Existing Resources(English) WordNet (Miller, 1995) - Contains no emotion specific information WordNet Affect (Strapparava and Valitutti, 2004) - A resource for SemEval-2007 shared task of “Affective Text”. - In SemEval-2007, a set of words from WordNet Affect relevant to the Ekman’s(1993) six emotional labels (joy, fear, anger, sadness, disgust, surprise) SentiWordNet (Esuli and Sebastiani, 2006) - Assigns three sentiment scores such as positive, negative and objective to each synset of WordNet Subjectivity Wordlist (Baneaet al., 2008) - Assigns words with strong or weak subjectivity and prior polarities of types positive, negative and neutral
Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation
Updating (1/4) • /* WordNet Affect Synset */ n#10337658 fit(A) scene(B) tantrum • /* SentiWordNetSynset for A’*/ tantrum/scene/conniption/fit/burst/fit_out/equip/outfit/tally/jibe/match/correspond/gibe/agree/check/conform_to/meet/set/primed/fit_to/fit_for/convulsion/paroxysm • /* SentiWordNetSynset for B’ */ tantrum/scene/conniption/fit/scenery/view/prospect/vista/panorama/aspect/shot • /* Updated Synset E’ */ tantrum/scene/conniption/fit/burst/fit_out/equip/outfit/tally/jibe/match/correspond/gibe/agree/check/conform_to/meet/set/primed/fit_to/fit_for/convulsion/paroxysm/scenery/view/prospect/vista/panorama/aspect/shot
Updating (2/4) Updating Using SentiWordNet (SW) (Esuli and Sebastiani, 2006) - Replace each word in the WordNet Affect by equivalent retrieved synsets of SentiWordNet if the synsets contain that emotion word - Part of speech (POS) information considered - Subjective score is not considered Updating Using VerbNet (VN) (Kipper-Schuler, 2005) - Largest online verb lexicon with explicitly stated syntactic and semantic information based on Levin’s verb classification - VerbNet files that are stored in an XML format contain member verbs with similar sense - Member verbs present for a specific class are sense based synonymous verbs and create verb synsets from each VerbNet class - Each word present in a verb synset (identified by “v” POS category in Wordnet Affect lists) is updated with VerbNetsynset - Duplicate Removal Strategy
Updating (3/4) • Duplicate Removal If the words “A” and “B” in WordNetAffect entry “E” are replaced by the retrieved SentiWordNet synsets A’ and B’ such that A1, A2, A3, B3 є A’ and B1, B2, B3, A3 є B’ then the updated entry E’ = (A’ – B’ ) + (B’ – A’) + (A’ ∩ B’ ). The A1, A2 and A3 are the words present in the retrieved synset A’ and B1, B2, B3 are in retrieved synset B’ as extracted from SentiWordNet B1 A1 B1 B3 A1 B2 A2 B’ A B A2 B3 A3 B2 A’ A3 E’ E
Updating (4/4) Table 1: Update of English WordNet Affect using SentiWordNet and VerbNet
Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation
Translation (1/2) • Samsad Bengali to English bilingual dictionary is available (http://home.uchicago.edu/~cbs2/banglainstruction.html) • English-to-Bengali bilingual synset based dictionary containing approximately 1,02,119 entries is being developed as part of the EILMT project (English to Indian Languages Machine Translation (EILMT) is a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India) • Convert the Affect word lists into Bengali using the dictionary followed by manual updates • Word combinations or idioms are not translated automatically • Total number of non-translated words in the six emotion lists is 210 figure is comprehensible for manual translation
Translation (2/2) Example of a Translated Synset Table 2: Results of the Translation
Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation
Bengali-English bilingual dictionary (http://home.uchicago.edu/~cbs2/banglainstruction.html) Synonymous Word Set (SWS) <[ kruddha ] a angry; angered, enraged; wrathful; indignant …> <[ kruddha ] a SWS1;SWS2;SWS3;SWS4; …> Hypothesis: “ Two words belonging to same or different translated synsets are grouped together to form a new Bengali synset if there is at least one common English equivalent word present in any formed SWSs for those words ” Sense Disambiguation (1/3)
Sense Disambiguation (2/3) SWS1 Example SWS2 Xb SWS1 Yb SWS2 Ze Example Synset
Sense Disambiguation (3/3) • - Xb and Yb are two Bengali words - Cxb and Cyb are English equivalent classes of Xb and Yb Cxb= {SWS1; SWS2; …..; SWSq} Cyb= {SWS1; SWS2; …..; SWSp} • If for i = 1 to p, j = 1 to q , (SWSiSWSj) , or Ze | Ze € SWSiSWSj, - Where Ze is an equivalent English word present in any of the Synonymous Word Sets (SWS) of Cxband Cybsimultaneously - Then a new Bengali synset with XbandYbis formed New English equivalent class is formed by merging SWSs of both Cxband Cyb • Process continues until any word in Bengali translated synset remains unclassified
Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation
Evaluation (1/2) Manual Agreement (Cohen’s Kappa) - Measures agreement between two raters who each classify items into some mutually exclusive categories - Emotion words present in the translated Bengali synonym sets - Binary decision (Yes /No) - Agreement values from 0.44 to 0.56 gives a significantly moderate value
Bengali WordNet Affect Lists Snapshot
Resources • Emotion Lexicon - D.Das and S.Bandyopadhyay. 2010. Developing Bengali WordNet Affect for Analyzing Emotion. In the proceedings of the 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL-2010), pp. 35-40,California, USA - Y. Torii,D. Das, S. Bandyopadhyay and M. Okumura. 2011. Developing Japanese WordNet Affect for Analyzing Emotions. In the Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), 49th Annual Meeting of the Association for Computational Linguistics(ACL), Portland, USA. (Accepted)
Emotion Corpus Guideline (1/3) • Random collection of 123 blog posts from Bengali web blog archive (www.amarblog.com) • Total 12,149 sentences (comics, politics, sports and short stories) • Three Annotators • No prior training was provided to the annotators • Instruction based on some illustrated annotated samples • Open source graphical tool (http://gate.ac.uk/gate/doc/releases.html)
Emotion CorpusGuideline (2/3) Items for Annotation • Emotional Expression (word / phrase) • Emotion Holder • Emotion Topic • Sentential Emotion - Ekman’s (1993) six classes “anger”, “disgust”, “fear”, “joy”, “sad” and “surprise” • Sentential Intensity - Low (L) , General (G) and High (H)
Emotion CorpusGuideline (3/3) • Relaxed Scheme - Annotators are free in selecting the texts spans (e.g. emotional expressions and topic) • Fixed Scheme - Annotators are given emotional items with fixed text spans (e.g. Emotion Holder, Sentential Emotion and Intensity)
Agreement (1/4) • Emotional expressions are words or strings of words • Agreement is carried out between the sets of text spans selected by two annotators • Strategies - MASI (Measure of agreement on set-valued items) used in Co reference annotation (Passonneau, 2004), Semantic and pragmatic annotation (Passonneau, 2006) - agr metric (Wiebe et al., 2005) for measuring directional agreement - Cohen’s Kappa (κ) (Cohen, 1960)
Agreement (2/4) Emotional Expressions (MASI, agr) Emoticons (Kappa) Sentential Emotions and Intensities (Kappa)
Agreement (3/4) Emotion Holder Cohen’s kappa (κ) (Cohen, 1960) Inter Annotator Agreement IAA - If X is a set of emotion holders selected by first annotator and Y is a set of emotion holders selected by the second annotator, IAA = X ∩ Y / X U Y Highly moderate for single emotion holder Less for multiple holders Disagreement occurs mostly for satisfying implicit constraints Resolved the issues by mutual understanding Emotion Holder (Kappa), [IAA]
Agreement (4/4) Emotion Topic • Topic consists of single or string of words • Scope of individual topics inside a target span is hard • Use of MASI and agr metric • Agreement for target span annotation is (≈ 0.9) satisfactory annotation • Disagreement - Less in sentences containing single emotion topic - Selecting boundaries of topic spans - Selecting emotion topic from other relevant topics Emotion Topic (MASI), [agr]
Resources • Emotion Corpus - D. Das and S. Bandyopadhyay. 2010. Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level. In the 8th Workshop on Asian Language Resources (ALR8), 23rd International Conference on Computational Linguistics (COLING 2010), pp. 47-55, August 21-22, Beijing, China
Example • Johnsurprisingly narrated the actual story. Evaluative Expression :surprisingly Emotion Holder: <John> Emotion Topic : story • রাশেদঅনুভব করেছিল যে রামেরসুখঅন্তহীন । (Rashed) (anubhab) (korechilo) (je) (Ramer) (sukh)(antohin) Rashed felt that Ram’s pleasure is endless. Evaluative Expression :সুখ (sukh) ‘pleasure’ Emotion Holder: < writer, রাশেদ (Rashed), রাম(Ram)> Emotion Topic : রামের সুখ(Ramer sukh) ‘Ram’s pleasure’
Salient Vertices • Evaluative Expressions (word/phrase/sentence/document level) • Holder Identification • Topic Detection
Evaluative Expressions (word/phrase/sentence/document level) • Evaluative Expressions - Subjective or Objective • Subjective Expressions - Positive or Negative (Sentiment) - Beyond Sentiment or fine grained Sentiment • Emotional Expression (word or phrase) is the subjective counterpart Ekman’s (1993) six universal emotions (joy / happiness, sadness, anger, disgust, fear and surprise)
Evaluative Expressions (word/phrase/sentence/document level) • (Ku et al., 2006) - Word - Phrase (Word + Context Features, e.g. intensifier, negation, conjunct) - Sentence (syntax + semantics + pragmatics) - Document • Hierarchical forward granular approach word phrase phrase sentence sentence document word sentence sentence document word document phrase document
Word Level Tagging • Baseline System - No prior knowledge regarding word features - Six separate modules for six emotion classes - Words passed through six separate modules - Tag each word with the emotion class • Baseline System + Stemming + WordNet Affect Lists - Stemming (Suffixes of Bengali Verbs depend on Tense, Aspect, and Person) - Bengali Stemmer uses suffix list and for English, porter stemmer (Porter, 1997) / WordNet Morphological Analyzer (Miller, 1990) - Evaluated using WordNet Affect lists (Strapparava and Valitutti, 2006; Das and Bandyopadhyay, 2010) - 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test sets
Word Level Tagging • Machine Learning System (CRF, SVM) Features (Das and Bandyopadhyay, 2009) · POS information (adjective, verb, noun, adverb) · First sentence in a topic · SentiWordNet emotion word (delight…) · Reduplication (so-so, good-good..) · Question words (what, why…) · Colloquial / Foreign words · Special punctuation symbols (!,@,?..) · Quoted sentence ( “you are 2 good man”) · Sentence Length (>=8,<15) · Emoticons ( , , ..) • Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations
Sentence Level Tagging (1/2) • Sense_Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet (Esuli and Sebastiani, 2006) for each synset in which each of the seed words appears - Fix the average retrieved score as Sense_Tag_Weight (STW) of that particular emotion tag Table 1: Sense_Tag_Weight (s)(STW) of six emotion tags
Sentence Level Tagging (2/2) • Sense_Weight_Score (SWS) for each emotion type - SWSi=(STWi*Ni)/(∑j=1 to 7STWj*Nj) | iЄj - SWSi is the Sentence level Sense_Weight_Score for the emotion type i - Ni is the number of occurrences of that emotion type in the sentence - Sentence level emotion tag SET = [maxi=1 to 7(SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSiproduced zero (0) emotion score - Post-processing for handling negative words (Das and Bandyopadhyay, 2009)
Document Level Tagging (1/2) • Heuristic features - Emotion tags of the title sentence - Emotion tags of the end sentence of a topic - Emotion tags assigned to an overall topic - Emotion tags for user comment portions of a document - Most frequent emotion tags identified from the document - Identical emotions that appear in the longest series of tagged sentences (Yang et al., 2007) - Emotion tags of the largest section among all of the user comments’ sections General Structure of a Bengali blog document