Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques Jun Li and Maosong Sun Department of Computer Science and Technology Tsinghua University, Beijing, China IEEE NLP-KE 2007

Outline • Introduction • Corpus • Features • Performance Comparison • Analysis and Conclusion

Introduction • Why do we perform the task ? • Much of the attention has centered on feature based sentiment extraction • Sentence-level analysis is useful, but it involves complex processing and usually format dependent (liu et al www05) • Sentiment Classification using machine learning techniques • based on the overall sentiment of a text • Easily transfer to new domains with a training set. • Applications: • Split reviews into the sets of positive and negative • Monitor bloggers mood trend • Filter subjective web pages

Corpus • From www.ctrip.com • Average length 69.6 words with std 89.0 • 90% of the reviews are less than 155 words • including some English words

Review rating distribution & score threadhold • 4.5 and up are considered positive, 2.0 and below are considered negative. • 12,000 reviews as training set, 4,000 reviews as test set

Features – text representation • Text representation schemes • Word-Based Unigram (WBU), widely used • Word-Based Bigram (WBB) • Chinese Character-Based Bigram (CBB) • Chinese Character-Based Trigram (CBT) Table 1. Statistics of training set with four text representation schemes

Features – representation in a graph model Features representation (n=2) in a graph model. D f1 f2 fk-1 x1 x2 x3 xk xk-1

Features - weight

Performance Comparison - methods • Support Vector Machines (SVM) • Naïve Bayes (NB) • Maximum Entropy (ME) • Artifical Neural Network (ANN) • two layers feed-forward • Baseline: Naive Counting • Predict by comparsion of number of sentiment words. • Heaivly depends on the sentiment dictionary • micro-averaging F1 0.7931, macro-averaging F1 0.7573.

Performance Comparison - WBU SVM, NB, ME, ANN using WBU as features with different feature weights

Performance Comparison - WBU Four methods using WBU as features

Performance Comparison - WBB Four methods using WBB as features

Performance Comparison – CBB & CBT Four methods using CBB as features Four methods using CBT as features

Performance Comparison

Analysis • On the average, NB outperforms all the other classifiers using WBB and CBT • N-gram based features relaxes conditional independent assumption of Naive Bayes Model • capture real integral semantic content • People like to use combination of words to express positive and negative sentiment.

Conclusion • (1) On the average, NB outperforms all the classifiers when using WBB, CBT as text representation scheme with bool weighing under different feature dimensionality reduced by chi-max, and is more stable than others. • (2) Compared with WBU, WBB and CBB have more strong meaning as semantic unit for classifiers. • (3) at most time, tfidf-c is much better for SVM and ME. • (4) Considering SVM achieve the best performance under all conditions and is the most popular method. We recommend using WBB, CBB to represent text with tfidf-c as feature weighting to obtain a better performance relative to WBU.

Thank you! Q & A Dataset and software is avaiable at http://nlp.csai.tsinghua.edu.cn/~lj/

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Presentation Transcript

Experimental Investigation on Micromachining using Electrochemical Discharge Machine

Electrical Load Forecasting Using Machine Learning Techniques

Machine Learning Classification for Document Review

Sentiment Classification using LM and Sentence Information

Frog classiﬁcation using machine learning techniques

Supervised Machine Learning: A Review of Classification Techniques Kotsiantis S.B.

Active Learning for Imbalanced Sentiment Classification

Building Sentiment Resources On Chinese Reviews

Character Recognition Using Machine Learning Techniques

Document Classification Techniques using LSI

Machine Learning (ML) Classification

Sentiment Analysis Based on Chinese Thinking Modes

Deeper Sentiment Analysis Using Machine Translation Technology

CLASSIFICATION AND REPRESENTATION OF MICROSTRUCTURES USING STATISTICAL LEARNING TECHNIQUES

Reservoir Uncertainty Assessment Using Machine Learning Techniques

Classification Techniques: Decision Tree Learning

A Review on Speech Feature Techniques and Classification Techniques

A Frame Study on Sentiment Analysis of Hindi Language Using Machine Learning

Diagnosing Diabetes Using Support Vector Machine in Classification Techniques

Machine Learning: Regression or Classification?

A Study on Credit Card Fraud Detection using Machine Learning