1 / 27

The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary

The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary. M.V. Klekovkina, E.V. Kotelnikov VSHU, Kirov. Contents. Sentiment analysis tasks Automatic sentiment analysis approaches Appraisal words extraction methods Emotional vocabulary forming Our method

petei
Download Presentation

The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Automatic Text Sentiment Analysis Method based on Emotional Vocabulary M.V. Klekovkina, E.V. Kotelnikov VSHU, Kirov

  2. Contents Sentiment analysis tasks Automatic sentiment analysis approaches Appraisal words extraction methods Emotional vocabulary forming Our method Results of experiments

  3. Sentiment analysis tasks • Subject determining • Who expresses the opinion? • Object determining • What is this opinion about? • Sentiment determining • Direction of the opinion

  4. Sentiment analysis tasks Elementary case of sentiment analysis: Positive sentiment Negative sentiment

  5. Approaches to classification • Rule‑based approach with patterns • Text division into words andword combinations • Selection of common patterns • Assignment of positive or negative sentiment to each pattern • Attachment of patternsto rules:«IFconditionTHENconclusion»

  6. Approaches to classification • Machine learning • TF.IDF • Building a statistical or probabilistic classifier

  7. Approaches to classification • Hybrid method • Application of classifiers based on several approaches in a particular sequence

  8. Method ofclassification Tokenization Selection of appraisal words and its emotional weighting Unification of weights by some function

  9. Methods of appraisal words extraction • Method proposed in [Turney, 2002]: • creation of two etalon sets of appraisal words (positive and negative) • calculation of word estimation by determining of combined entering with appraisal words from etalon sets • Manually, by experts • Using of different dictionaries to enlarge sets of appraisal words: • antonyms, synonyms, hyponyms • similarity of words interpretation

  10. Vocabulary forming • Appraisal words: • Manual selection (60words) • Words from training collection with the largest weight(method RF) for each class of sentiment (+200 words)[Lan, 2009]: • Manualweight setting: –from–5to–1for negative words –from+1to+5for positive words

  11. Vocabulary forming Tuning of the appraisal word weight: • word-modifiers • Мне очень понравился фильм, особенно порадовал непредсказуемый конец I like this film very much, especiallyits unpredictable ending. • words whichexpressnegation • Самая посредственная и не смешная комедия из тех, что я видел There is nothing funny and humorous in this film.

  12. Vocabulary forming • Word-modifiers • Enlarging emotional weight of the appraisal word: оченьхорошо(very good) • Reducingemotional weight of the appraisal word: немного лучше(a little better) • Adverbs довольно, особенно(rather, especially) • Adjectives полный, абсолютный(perfect, absolute)

  13. Vocabulary forming • Tuning the weight with the word-modifiers: • change the weight of the neighbor appraisal word «хорошо» (good):weight= 3 «оченьхорошо» (verygood):weight= 3 * (100% + 50%) = 4.5 • apply recursively from the nearest word-modifier to the appraisal word «действительноочень хорошо» (really very good): weight =4.5 * (100% + 15%) = 5.175

  14. Vocabulary forming • Words whichexpressnegation: • particlesне, ни(not,nor) • pronounsничего(nothing) • Weight tuning: • shift the weight of the appraisal word to the opposite polarity by a fixed value «хорошо» (good):weight= 3 «нехорошо» (notgood):weight= 3 + (–4) = –1

  15. Vocabulary forming Configuration with cross-validation method for Q units (Q=5) based on training collection: • percentages for word-modifiers • the value of shift of a word which expresses negation

  16. Vocabulary forming Emotional vocabulary: • 260appraisal words • 19word-modifiers • 3words which express negation

  17. Our method • Weightingof texts from train collection • Calculation of the average weights for each class of sentiment • Determining ofa boundary weight between different classes of text sentiment

  18. Our method • Weightingof the text: WT– weight of textT Wi – weight of i-thappraisal word NT – a number of appraisal words in textT

  19. Our method Exclusion of positive sentiment texts which are farleft than most of positive texts Exclusion of negative sentiment texts which are farright than most of negative texts

  20. Our method

  21. Our method • Calculation of the average weights for each class of sentiment: , TiC AWT– average weights of texts (class of sentiment C) NC – a number of texts belonging to the class of sentiment C

  22. Our method • Determining ofa boundary weight between different classes of text sentiment : AWT– average weights of texts (class of sentiment C) d – the centre of segment

  23. Our method • Decision of the classifier: • textThas positive sentiment ifweight WTis more or equal tod • textT has negative sentiment if weight WTis less than d

  24. ROMIP (Russian Information Retrieval Evaluation Seminar) • Tracks for sentiment analysis • movie reviews • book reviews • digital camera reviews • Tasks • two-class classification • three-class classification • five-class classification

  25. ROMIP (Russian Information Retrieval Evaluation Seminar) Movie reviews • Train collection • 15718reviews • review includes: - text message - estimation on a scale from 1 to 10 • Test collection • 312 reviews

  26. Results of experiments Baseline SVM Lexical The best result of ROMIP-2011

  27. Thank you for your attention!

More Related