170 likes | 364 Views
Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques. Jun Li and Maosong Sun Department of Computer Science and Technology Tsinghua University, Beijing, China IEEE NLP-KE 2007. Outline. Introduction Corpus Features Performance Comparison
E N D
Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques Jun Li and Maosong Sun Department of Computer Science and Technology Tsinghua University, Beijing, China IEEE NLP-KE 2007
Outline • Introduction • Corpus • Features • Performance Comparison • Analysis and Conclusion
Introduction • Why do we perform the task ? • Much of the attention has centered on feature based sentiment extraction • Sentence-level analysis is useful, but it involves complex processing and usually format dependent (liu et al www05) • Sentiment Classification using machine learning techniques • based on the overall sentiment of a text • Easily transfer to new domains with a training set. • Applications: • Split reviews into the sets of positive and negative • Monitor bloggers mood trend • Filter subjective web pages
Corpus • From www.ctrip.com • Average length 69.6 words with std 89.0 • 90% of the reviews are less than 155 words • including some English words
Review rating distribution & score threadhold • 4.5 and up are considered positive, 2.0 and below are considered negative. • 12,000 reviews as training set, 4,000 reviews as test set
Features – text representation • Text representation schemes • Word-Based Unigram (WBU), widely used • Word-Based Bigram (WBB) • Chinese Character-Based Bigram (CBB) • Chinese Character-Based Trigram (CBT) Table 1. Statistics of training set with four text representation schemes
Features – representation in a graph model Features representation (n=2) in a graph model. D f1 f2 fk-1 x1 x2 x3 xk xk-1
Performance Comparison - methods • Support Vector Machines (SVM) • Naïve Bayes (NB) • Maximum Entropy (ME) • Artifical Neural Network (ANN) • two layers feed-forward • Baseline: Naive Counting • Predict by comparsion of number of sentiment words. • Heaivly depends on the sentiment dictionary • micro-averaging F1 0.7931, macro-averaging F1 0.7573.
Performance Comparison - WBU SVM, NB, ME, ANN using WBU as features with different feature weights
Performance Comparison - WBU Four methods using WBU as features
Performance Comparison - WBB Four methods using WBB as features
Performance Comparison – CBB & CBT Four methods using CBB as features Four methods using CBT as features
Analysis • On the average, NB outperforms all the other classifiers using WBB and CBT • N-gram based features relaxes conditional independent assumption of Naive Bayes Model • capture real integral semantic content • People like to use combination of words to express positive and negative sentiment.
Conclusion • (1) On the average, NB outperforms all the classifiers when using WBB, CBT as text representation scheme with bool weighing under different feature dimensionality reduced by chi-max, and is more stable than others. • (2) Compared with WBU, WBB and CBB have more strong meaning as semantic unit for classifiers. • (3) at most time, tfidf-c is much better for SVM and ME. • (4) Considering SVM achieve the best performance under all conditions and is the most popular method. We recommend using WBB, CBB to represent text with tfidf-c as feature weighting to obtain a better performance relative to WBU.
Thank you! Q & A Dataset and software is avaiable at http://nlp.csai.tsinghua.edu.cn/~lj/