1 / 16

Sentiment and Textual analysis of Create-Debate data

Sentiment and Textual analysis of Create-Debate data. EECS 595 – End Term Project. Poorva Potdar. EUREKA!! – Getting the Idea. Why sentiment analysis? Huge amount of opinionated Text on web Sentiment Analysis on web – popularity of a product, movie or a person as such. Idea :

penda
Download Presentation

Sentiment and Textual analysis of Create-Debate data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project Poorva Potdar

  2. EUREKA!! – Getting the Idea • Why sentiment analysis? • Huge amount of opinionated Text on web • Sentiment Analysis on web – popularity of a product, movie or a person as such. • Idea: • Create Debate – online debating forum where people argue for/against some topic. • Mine for the salient text features for agreement/disagreement posts.

  3. Creating the Haystack …. -14308 Debates…. -983800 Sentences! -178290 Posts, -9194 Users - Labeled dataset Neutral Agreement Disagreement Structural Analysis – Certain features of the language in the post that make it a high score agreement/disagreement post. • Math Behavioral Analysis – Aspects of User’s behavior that give him a high rank on the forum.

  4. What's the gain? • Influence detection in a community • Sub-Group Detection • Stance Identification – Are there any visible groups with a particular stance? • Predict the Crowd Trend for a particular topic of interest? • Text Summarization

  5. Finding the needle - structural features ….

  6. Experiment 1 : Polarity Measure • Intuition : Is the number of +ve/-ve words an indicative of how popular a post is? • Tool – Opinion Finder/ Wordnet. • Output of processed data by Opinion Finder. • <MPQASRC>It</MPQASRC> <MPQASD>think</MPQASD> it's <MPQAPOL autoclass="negative">wrong</MPQAPOL> to <MPQASD>assume</MPQASD> that in order to be a revolutionary thinker you have to be <MPQAPOL autoclass="negative">crazy</MPQAPOL> • MPQAPOL– Indicates the polarity of the word like “bad” • MPQASRC– Indicates the opinion source in the sentence like “It” • MPQASD– Direct subject expression in the sentence like “said” • Result : • No evident correlation between number of polar words and the rank of the post • Authors use equal distribution of positive and negative words while expressing agreement/disagreement.

  7. Experiment 2 : Readability Measure • Intuition : Do the posts that are more readable/formal gain higher scores? • Tool – Flesch Toolkit to analyze the Flesch Readability measure for each post. • Calculated Pearson’s coefficient between the labeled score and Flesch score for each of the posts. • Result : High correlation - the more formal the language of a post, the more is the points associated with it. • Eg 1 : “good times . . .bring it back ! -------------=-=-=-=-=-=-=-=-=-=-=-=-==-=-=- ))))))))))))” [Flesch – 0, Labeled points - 1] • Eg 2 : “Vegetables is often seen as more healthy than eating meat.” [Flesch – 93.12, Labeled points – 29 (max)]

  8. Experiment 3 : Emoticon analysis • Intuition : Do Emoticons in agreement/disagreement posts have any correlation with their labeled scores? • Tool – CMU Ark Tagger [Stanford Parser doesn’t scale well]. • Pearson’s coefficient between the labeled score and number of +ve/-ve emoticons for agreement/disagreement posts. • Result : High correlation between number of emoticons and rank of disagreement posts. • Analysis : authors tend to use expressive emoticons like smiles to give a sarcastic opinion regarding a particular argument. • “Hey! What’s that supposed to mean?;)” , • “Sure If you say so :P”.

  9. Experiment 4 : Dependency Parse • Intuition : Do highly ranked agreement/disagreement posts depict a popular dependency pattern? • Agreement posts tend to express an agreement early on in the post, while disagreement is mild. • Tool – Stanford Parser – Syntactic and Dependency Parse of the posts. • Result: A lot of highly ranked agreement posts showed a popular dependency pattern as follows that begins with - • I->nsubj->+ve [I agree to, I like your point, I up-voted your argument] Stanford Parser + ExtractDependencies Code to traverse PRP to PRP$ Sentiwordnet

  10. Finding the needle - behavioral features ….

  11. Which Authors get the highest rank? -1 • Intuition : To find if average number of times an author participates in a thread has a correlation with his ranking? • Result : • There is a pretty evident positive correlation of an author’s points to the number of times he participates in the discussion posts per thread.

  12. Which Authors get the highest rank?-2 • Intuition : To find if authors who participate in some kind of discussion/ or start a new thread get a high rank ? • Result : • Rating of authors who agree > Rating of authors who disagree more > Rating of authors who start a new debate. • Authors who participate more in discussions are more popular.

  13. Which Authors get the highest rank?-3 • Intuition : To find if a authors that participate early/late in discussion fetch more ranking? • Result : • Authors participating late in discussion are likely to have higher ranking. • By Intuition, authors who come late in discussion already know the opinion bias. • Participating early doesn’t help in ranking

  14. Get the Ranking of Authors w.r.t features • Trained a linear regression model using Weka’s Libsvm and got a predicted ranking of all authors based on the features. • Got a correlation coefficient by comparing these rankings vs the gold standard rankings. • Result : • The feature vector set shows a decent correlation with the actual rankings.

  15. Future Work • In this project, I essentially looked at some of the structural and behavioral features • The opinion finder tool also tells whether it is a subjective or objective. • One of the future Experiments – to find if there exists a correlation between subj/obj sentences and score of post? • Does the length of the post matter? • Going forward - consolidate all these features and results in the database and make it available as an open-source dataset

  16. Thank You!

More Related