150 likes | 392 Views
Domain-based Lexicon Enhancement for Sentiment Analysis. A. Muhammad, N. Wiratunga, R. Lothian, R. Glassey IDEAS Research Institute, Robert Gordon University, Aberdeen. Introduction. Sentiment Classification Sentiment Analysis A wider task, involves identification of Object/Aspects
E N D
Domain-based Lexicon Enhancement for Sentiment Analysis A. Muhammad, N. Wiratunga, R. Lothian, R. Glassey IDEAS Research Institute, Robert Gordon University, Aberdeen
Introduction • Sentiment Classification • Sentiment Analysis • A wider task, involves identification of • Object/Aspects • Opinion holder • Time Text Sentiment Classification BCS-SGAI-SMA-2013, Cambridge UK
Sentiment Classification • Machine Learning The movie is good : + The movie is horrible : - I don’t like the movie : - I love the movie : + … Classifier e.g. NB, SVMs Model The movie is nice : ? BCS-SGAI-SMA-2013, Cambridge UK
Sentiment Classif… Cont’d • Lexicon-Based Contextual analysis/Aggregation The movie is nice : ? BCS-SGAI-SMA-2013, Cambridge UK
Lexicon Generation Manual Corpus Dictionary • Could be too narrow • Could be too General • This movie is fantastic • Ugh!! this movie sucks! BCS-SGAI-SMA-2013, Cambridge UK
Sentiment Lexicons • Dictionary-based: SentiWordNet(Baccianella et. al, 2010) • Corpus-based • Generated from target domain • Existing approaches rely on well-formed spelling/grammar Corpus Seed horrible happy affordable rubbish enjoyable mad … horrible happy affordable rubbish enjoyable mad … good bad terrible nice Excellent Poor but coocurrence and (Hatzivassiloglou and Mckeown, 1997) Turney, 2002 BCS-SGAI-SMA-2013, Cambridge UK
Corpus-based lexicon • Distant-Supervision (Read 2005, Go et al 2009) • Automated approach for labelling • Based on appearance of emoticons (, ) • I’m at work today • I’m happy with chocolate on vday BCS-SGAI-SMA-2013, Cambridge UK
Scores Generation • Proportion-based • Scores are compatible with SentiWordNet BCS-SGAI-SMA-2013, Cambridge UK
Integration with SentiWordNet • General Scores are extracted from SentiWordNet BCS-SGAI-SMA-2013, Cambridge UK
Evaluation • 20,000 Dist-Sup tweets used to: • Generate domain lexicon • Train Machine Learning classifiers • For comparison • 359 hand-labelled tweets used for evaluation BCS-SGAI-SMA-2013, Cambridge UK
Evaluation Cont’d • Individual lexicons Vs Combined • General < Domain < Combined • Difference not significant btw Domain and Combined • Machine learning Vs Combined • SVM < NB < LogReg < Combined • Difference not significant btw LogReg and Combined BCS-SGAI-SMA-2013, Cambridge UK
Evaluation Cont’d • Varying data sizes • Performance improves with increasing size for all except SVM BCS-SGAI-SMA-2013, Cambridge UK
Conclusions • Sentiment lexicon is generated using distant-supervision • Sentiment classification improves with combination of domain-dependent and domain-independent lexicons • Accuracy of the combination is better than machine learning BCS-SGAI-SMA-2013, Cambridge UK
Future work • Lexicon refinement • Improve aggregation strategy • Extend approach to other Social media platforms • Extend Dist-sup to neutral labelling • Experiment with ‘big data’ BCS-SGAI-SMA-2013, Cambridge UK
Thank you for Listening! BCS-SGAI-SMA-2013, Cambridge UK