50 likes | 175 Views
Sentiment Analyzer Using a Multi-Level Classifier. Tommy Tsai and Sam Yam CS224N Final Project Spring 2006. Motivation. Humans can infer a document’s sentiment orientation from high-level features Sentiment of subjective sentences only Local sentiments of select portions of the document
E N D
Sentiment AnalyzerUsing a Multi-Level Classifier Tommy Tsai and Sam Yam CS224N Final Project Spring 2006
Motivation • Humans can infer a document’s sentiment orientation from high-level features • Sentiment of subjective sentences only • Local sentiments of select portions of the document • Length of document (domain-specific) • Domain of documents analyzed: 2000 IMDB movie reviews • 1000 positive, 1000 negative (classified by hand) • Hard • Most mention both positive and negative aspects • Many are written in a sarcastic tone
Architecture • Three stages • Subjectivity filter • N-gram-based classifier • Input: Full review • Output: Filtered review • Sentence-level sentiment classifier • Also N-gram-based • Input: Filtered review • Output: Positive/negative classification and scores for each sentence • Document-level sentiment classifier • Support vector machine classifier • Inputs: Filtered review + scores and classification for each sentence • Output: Positive/negative classification for the entire review
Stage 3 SVM Classifier • Support vector machines • Proven to have good performance for machine-learning classification problems • Maximization of “margin”: absolute distances between classification hyperplane and closest data points on either side • Features that worked well: • Average sentence-level “positive” and “negative” classification scores • Document-wide ratio of positive sentences to all sentences (PSRs) • Local PSRs for each of 5 buckets in the review • Number of sentences in the review
Results and Conclusions • Hybrid model worked well our multi-level model does improve classifier’s discriminatory power • Character N-gram models worked better than token N-gram models • Negative reviews are harder to classify! • 30 video game reviews: 27/30 = 90% accuracy