250 likes | 345 Views
advisor of. Lu Chen. Amit Sheth. Kno.e.sis Center. has PhD student. director of. is in. Wright State University. I am here…. Extracting What We Think and How We Feel from What We Say in Social Media. ---- Subjective Information Extraction. Lu Chen Kno.e.sis Center
E N D
advisor of Lu Chen Amit Sheth Kno.e.sis Center has PhD student director of is in Wright State University I am here…
Extracting What We Think and How We Feel from What We Say inSocial Media ---- Subjective Information Extraction Lu Chen Kno.e.sis Center Wright State University http://cdryan.com/blog/think-feel/ Subjective Information Extraction, Lu Chen
Subjectivity refers to the subject and his or her perspective, feelings, beliefs, and desires. in philosophy, the term is usually contrasted with objectivity. [1] • Extraction of subjective information: • Extracting structured subjective • information from unstructured content • Allowing computation to be done on “what people think” and “how people feel” http://fineartamerica.com/featured/its-all-subjective-john-crowther.html [1] Block, Ned; Flanagan, Owen J.; & Gzeldere, Gven (Eds.) The Nature of Consciousness: Philosophical Debates. Cambridge, MA: MIT Press. Subjective Information Extraction, Lu Chen
dynamic Directions subjective information static fine-grained coarse-grained • Fromcoarse-grainedtofine-grained • Document level -> sentence level -> expression level • General sentiment -> domain-dependent sentiment -> target-dependent sentiment • Sentiment Subjective information • Sentiment (positive/negative/neutral) -> emotion (happy, sad, angry, surprise, etc.) • Other types of subjective information: Intent, suggestion/recommendation, wish/expectation, outlook, viewpoint, etc. • Fromstatic to dynamic • Our attitude can be changed during social communication. • Modeling, detecting, and tracking the change of attitude • What leads to the change of attitude? E.g., persuasion campaign Subjective Information Extraction, Lu Chen
Progress Aug, 2011 May, 2012 Jan, 2012 Understanding and Modeling Emotions with Tweets Electoral Prediction Discovering Fine-grained Sentiment in Suicide Notes Extracting Sentiment Expressions from Twitter Subjective Information Extraction, Lu Chen
Extracting Diverse Sentiment Expressions With Target-dependent Polarity from TwitterLu Chen, Wenbo Wang, MeenakshiNagarajan, Shaojun Wang, and Amit P. Sheth Extracting a diverse and richer set of sentiment-bearing expressions, including formal and slang words/phrases Assessing the target-dependent polarity of each sentiment expression A novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus Subjective Information Extraction, Lu Chen
Challenges Quantitative Study of 3000 Tweets: Distributions of N-grams and Part-of-speech of the Sentiment Expressions Sentiment expressions in tweets can be very diverse. Subjective Information Extraction, Lu Chen
Challenges predictable movie predictable predictable stock long river long long battery life long time for downloading • The polarity of a sentiment expression is sensitive to its target. Subjective Information Extraction, Lu Chen
Approach Extracting Candidate Expressions Identifying Inter-Expression Relations Assessing Target-dependent Polarity Subjective Information Extraction, Lu Chen
Extracting Candidate Expressions • Root word: a word that is considered sentiment-bearing in general sense. • Collecting root words from • General-purpose sentiment lexicons: MPQA, General Inquirer, and SentiWordNet • Slang dictionary: Urban Dictionary • For each tweet, selecting the “on-target” root words, and extracting all the n-grams that contain at least one selected root word as candidates Subjective Information Extraction, Lu Chen
Identifying Inter-Expression Relations • Connecting the candidate expressions via two types of inter-expression relations – consistency relation and inconsistency relation • Basic ideas: • A sentiment expression is inconsistent with its negation; two sentiment expressions linked by contrasting conjunctions are likely to be inconsistent. • Two adjacent expressions are consistent if they do not overlap, and there is no extra negation applied to them or no contrasting conjunction connecting them. Subjective Information Extraction, Lu Chen
An Example I saw The Avengers yesterday evening. It waslongbut it was very good! I do enjoyThe Avengers, but it's both overrated and problematic. Saw the avengers last night. Mad overrated. Cheesy lines and horrible writing. Very predictable. The avengers was goodbut the plot was just simple minded and predictable. The Avengers was good. I was not disappointed. Subjective Information Extraction, Lu Chen
Assessing Target-dependent Polarity • For each candidate expression , • P-Probability – the probability that indicates positive sentiment • N-Probability – the probability that indicates negative sentiment • For each pair of candidate expressions and , • Consistency probability – the probability that and have the same polarity: • Inconsistency probability – the probability that and have different polarities: Subjective Information Extraction, Lu Chen
An Optimization Model We want the consistency and inconsistency probabilities derived from the the P-Probabilities and N-Probabilities of the candidates will be closest to their expectations suggested by the relation networks. Objective Function: Subjective Information Extraction, Lu Chen where and are the weights of the edges (the frequency of the relations) between and in the consistency and inconsistency relation networks, and n is the total number of candidate expressions.
The Example Subjective Information Extraction, Lu Chen
Evaluation Reference: Qiu, G.; Liu, B.; Bu, J.; and Chen, C. 2009. Expanding domain sentiment lexicon through double propagation. In Proc. of IJCAI. • Datasets: • 168,005 tweets about movies • 258,655 tweets about persons • Gold standard: • 1,500 tweets labeled with sentiment expressions and overall polarities for the movie targets • 1,500 tweets labeled with sentiment expressions and overall polarities for the person targets • Baseline methods: • MPQA, GI, SWN: For each extracted root word regarding the target, simply look up its polarity in MPQA, General Inquirer and SentiWordNet, respectively. • PROP: a propagation approach proposed by Qiu et al. (2009) • COM-const: Assign 0.5 to all the candidates as their initial P-Probabilities. • COM-gelex: Initialize the candidates’ polarities according to the root word set. Subjective Information Extraction, Lu Chen
Application Subjective Information Extraction, Lu Chen
Relevance of User Groups Based on Demographics and Participation to Social Media Based Prediction -- -- A Case Study of 2012 U.S. Republican Presidential PrimariesLu Chen, Wenbo Wang, and Amit P. Sheth 1. Providing a detailed analysis of the social media users on different dimensions 2. Estimating the “vote” of each user by analyzing his/her tweets, and predicted the results based on “vote-counting” 3. Examining the predictive power of different user groups in predicting the results of Super Tuesday races in 10 states Subjective Information Extraction, Lu Chen Existing studies on predicting election result are under the assumption that all the users should be treated equally. How could different groups of users be different in predicting election results?
User Categorization Content Type Location Engagement Degree Tweet Mode Political Preference Subjective Information Extraction, Lu Chen
Electoral Prediction with Different User Groups Revealing the challenge of identifying the vote intent of “silent majority” Retweets may not necessarily reflect users' attitude. The right-leaning user group provides the most accurate prediction result. In the best case (56-day time window), it correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1. To some extent, it demonstrates the importance of identifying likely voters in electoral prediction. Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets Subjective Information Extraction, Lu Chen
Emotion • Discovering Fine-grained Sentiment in Suicide Notes: Classify each sentence from suicide notes into 15 emotional categories, e.g., love, pride, guilt, blame, hopelessness, etc. • Emotion Identification from Twitter Data: 7 emotion categories, including joy, sadness, anger, lover, fear, thankfulness, and surprise • Can we automatically create a large emotion dataset with high quality labels from Twitter? How? • What features can effectively improve the performance of supervised machine learning algorithms? • How much performance will be gained by increasing the size of the training data? • Can the system developed on Twitter data be directly applied to identify emotions from other datasets? Subjective Information Extraction, Lu Chen
What’s next? Detecting the change of attitude during persuasive communication dynamic subjective information Discriminating other types of subjective information from sentiment, e.g., wish, intent static fine-grained coarse-grained Subjective Information Extraction, Lu Chen
Thank you ! Subjective Information Extraction, Lu Chen