Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By EshragRefaee OSACT 27 May 2014

Outline 1. Introduction • The concept of subjectivity and sentiment analysis (SSA) • Motivations and challengesof SSA for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

Subjectivity and Sentiment analysis (SSA) • Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text.

Hierarchical Model of Subjectivity and Sentiment analysis (SSA)

SSA andSocial Networks • The growing importance of sentiment analysis coincides with the growth of social media such as micro-blogs.

SSA andTwitter • Twitter (Statistic Brain, 2014) • March 2012, Twitter now available in Arabic (Twitter Blog, 2012)

About Arabic • Arabic is the language of over 422 million people • Arabic language can be classified into three major levels (Habash, 2010): • Classic Arabic (CA) • Modern standard Arabic (MSA) • Arabic Dialects (AD). Used in social networks side-by-side

Challenges with Respect to Arabic • Limited availability of NLP resources for DA. • Noisy features. • No large-scale Arabic Twitter corpus annotated for SSA publically available. • Sparse labelled data. • BUT: Lots of unlabelled data!

Challenges With Respect to Twitter • ‘Bad language’ (Eisenstein, J. 2013) • Unclear sentiment indicator • Dynamic nature/ topic-shifting (Go et al, 2009). المساواة في قمع الحريات الشخصية عدل Equality in supressing personal freedom is justice ew , ugh instead of disgusting bro instead of brother

Previous Work on SSA of Arabic Tweets • Word-based features. • SVMshown to perform best (large feature sets) • Evaluation: • 10-fold cross-validation • Held-out test set from same corpus •  No test for unseen topics/ scalability for topic shift!

Outline 1. Introduction • Motivations and challengesof subjectivity and sentiment analysis (SSA) for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

Methodology and Approach Features Human annotators Gold-standard labelled tweets Un-labelled tweets Arabic ALP tools Model evaluation Manually-annotated held-out test set Train machine learning scheme: SVM classifier

Arabic Twitter SSA Corpora • Data Collection • Twitter Search Application Programming Interface (API) • Search criteria • Keywords, locations, etc. • Pre-processing • Normalising user-names, URLS, digits, query-terms.

Arabic Twitter SSA Corpora: Gold Standard Data Set • Manually annotated for sentiment analysis (total=3,309) • 2 native speaker annotators (weighted Kappa=0.76)

Arabic Twitter SSA Corpora: Held-out Test Set • 963 tweets were manually annotated for evaluating the trained models.

Arabic Twitter SSA Corpora • Examples of annotated tweets

Features Extraction

SSA Classification:Problem Formulations

Machine Learning Classifiers • Support Vector Machines (SVM): Sequential Minimal Optimization-SMO (Platt, 1999) • Majority baseline: ZeroR SVM aims to identify the Optimal hyperplanethat linearly separates data instances with the maximum margin (Hsu et al, 2003)

Evaluation Metrics • F-measure • Accuracy: • Significant differences: T-test with p<0.05

Outline 1. Introduction • Motivations and challengesof subjectivity and sentiment analysis (SSA) for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

Results and evaluation

Error Analysis: • The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared

Error Analysis • The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared

Current Work Noisy labels: #hashtags & A large-scale Arabic Twitter SSA Corpus: DISTANT supervision (DS) data set Un-labelled tweets Features Automatically-labelled tweets Arabic ALP tools Model evaluation: Manually-annotated test set Train machine learning scheme: Learn SVM classifier **Refaee and Rieser (2014). Can we Read Emotions from a smiley face? Emoticon-based distant supervision for subjectivity and sentiment analysis of Arabic Twitter feeds. In the 5th International Workshop on Emotion, Social Signals, Sentiment and Linked Open Data.

Please come and see my poster on May 29, Time 11:45-13:25 Session: social media processing P 32 No. 317

Thanks Looking forward to hear your feedback … Or contact me through Eaar1@hw.ac.uk @eshragR

DS for SSA of social networks in other languages

Example of annotation disagreement

Methodology and Approach Noisy labels: #hashtags & Features Un-labelled tweets Arabic ALP tools Automatically-labelled tweets Model evaluatio: Manually-annotated test set Train machine learning scheme: Learn SVM classifier

Approach and methodology

Experimental settings • Pre-processing Remove re-tweets Normalize Latin characters , digits, URLs, user-names, hashtags Replace > 2 repetitive characters consecutively with only 2 Apply light Arabic stemmer Remove stop words • Problem formulations Two-stage binary classification: subjective vs. objective; positive vs. negative One-stage multi-class classification: positive vs. negative vs. neutral

DS for SSA of social networks in other languages

DS for SSA of social networks in other languages 73.81%

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources

Presentation Transcript

Sentiment Analysis

Manual and Automatic Subjectivity and Sentiment Analysis

Subjectivity and Sentiment Analysis

Sentiment Analysis of Arabic Social Networks

Sentiment Analysis

Subjectivity and Sentiment Analysis: from Words to Discourse

Sentiment Analysis and Subjectivity

Sentiment Analysis

Subjectivity and Sentiment Analysis: from Words to Discourse

Sentiment Analysis

Sentiment Analysis

Sentiment analysis

Sentiment Analysis

Subjectivity and Sentiment Analysis

Manual and Automatic Subjectivity and Sentiment Analysis

CS3730 Fall 2008 Subjectivity and Sentiment Analysis

Sentiment Analysis

Manual Subjectivity Analysis

Subjectivity and Sentiment Analysis: from Words to Discourse

Subjectivity and Sentiment Analysis

Sentiment Analysis