270 likes | 285 Views
The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary. M.V. Klekovkina, E.V. Kotelnikov VSHU, Kirov. Contents. Sentiment analysis tasks Automatic sentiment analysis approaches Appraisal words extraction methods Emotional vocabulary forming Our method
E N D
The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary M.V. Klekovkina, E.V. Kotelnikov VSHU, Kirov
Contents Sentiment analysis tasks Automatic sentiment analysis approaches Appraisal words extraction methods Emotional vocabulary forming Our method Results of experiments
Sentiment analysis tasks • Subject determining • Who expresses the opinion? • Object determining • What is this opinion about? • Sentiment determining • Direction of the opinion
Sentiment analysis tasks Elementary case of sentiment analysis: Positive sentiment Negative sentiment
Approaches to classification • Rule‑based approach with patterns • Text division into words andword combinations • Selection of common patterns • Assignment of positive or negative sentiment to each pattern • Attachment of patternsto rules:«IFconditionTHENconclusion»
Approaches to classification • Machine learning • TF.IDF • Building a statistical or probabilistic classifier
Approaches to classification • Hybrid method • Application of classifiers based on several approaches in a particular sequence
Method ofclassification Tokenization Selection of appraisal words and its emotional weighting Unification of weights by some function
Methods of appraisal words extraction • Method proposed in [Turney, 2002]: • creation of two etalon sets of appraisal words (positive and negative) • calculation of word estimation by determining of combined entering with appraisal words from etalon sets • Manually, by experts • Using of different dictionaries to enlarge sets of appraisal words: • antonyms, synonyms, hyponyms • similarity of words interpretation
Vocabulary forming • Appraisal words: • Manual selection (60words) • Words from training collection with the largest weight(method RF) for each class of sentiment (+200 words)[Lan, 2009]: • Manualweight setting: –from–5to–1for negative words –from+1to+5for positive words
Vocabulary forming Tuning of the appraisal word weight: • word-modifiers • Мне очень понравился фильм, особенно порадовал непредсказуемый конец I like this film very much, especiallyits unpredictable ending. • words whichexpressnegation • Самая посредственная и не смешная комедия из тех, что я видел There is nothing funny and humorous in this film.
Vocabulary forming • Word-modifiers • Enlarging emotional weight of the appraisal word: оченьхорошо(very good) • Reducingemotional weight of the appraisal word: немного лучше(a little better) • Adverbs довольно, особенно(rather, especially) • Adjectives полный, абсолютный(perfect, absolute)
Vocabulary forming • Tuning the weight with the word-modifiers: • change the weight of the neighbor appraisal word «хорошо» (good):weight= 3 «оченьхорошо» (verygood):weight= 3 * (100% + 50%) = 4.5 • apply recursively from the nearest word-modifier to the appraisal word «действительноочень хорошо» (really very good): weight =4.5 * (100% + 15%) = 5.175
Vocabulary forming • Words whichexpressnegation: • particlesне, ни(not,nor) • pronounsничего(nothing) • Weight tuning: • shift the weight of the appraisal word to the opposite polarity by a fixed value «хорошо» (good):weight= 3 «нехорошо» (notgood):weight= 3 + (–4) = –1
Vocabulary forming Configuration with cross-validation method for Q units (Q=5) based on training collection: • percentages for word-modifiers • the value of shift of a word which expresses negation
Vocabulary forming Emotional vocabulary: • 260appraisal words • 19word-modifiers • 3words which express negation
Our method • Weightingof texts from train collection • Calculation of the average weights for each class of sentiment • Determining ofa boundary weight between different classes of text sentiment
Our method • Weightingof the text: WT– weight of textT Wi – weight of i-thappraisal word NT – a number of appraisal words in textT
Our method Exclusion of positive sentiment texts which are farleft than most of positive texts Exclusion of negative sentiment texts which are farright than most of negative texts
Our method • Calculation of the average weights for each class of sentiment: , TiC AWT– average weights of texts (class of sentiment C) NC – a number of texts belonging to the class of sentiment C
Our method • Determining ofa boundary weight between different classes of text sentiment : AWT– average weights of texts (class of sentiment C) d – the centre of segment
Our method • Decision of the classifier: • textThas positive sentiment ifweight WTis more or equal tod • textT has negative sentiment if weight WTis less than d
ROMIP (Russian Information Retrieval Evaluation Seminar) • Tracks for sentiment analysis • movie reviews • book reviews • digital camera reviews • Tasks • two-class classification • three-class classification • five-class classification
ROMIP (Russian Information Retrieval Evaluation Seminar) Movie reviews • Train collection • 15718reviews • review includes: - text message - estimation on a scale from 1 to 10 • Test collection • 312 reviews
Results of experiments Baseline SVM Lexical The best result of ROMIP-2011