140 likes | 587 Views
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005. About this problem. To label scales Differ from “thumbs up” or not
E N D
Seeing stars: Exploiting class relationships for sentiment categorization withrespect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005
About this problem • To label scales • Differ from “thumbs up” or not • Differ from identifying opinion strength • Differ from ranking (+classification) • Movie reviews from Rotten Tomatoes • Study on human subjects • Three algorithms
Problem validation and formulation (1) • Check how human performs to compare with machine’s performance • Use reviews of one author to factor out the effects of cross-author divergence • A notch equals half star/four or five stars; 10 points/100 points • Random-choice baseline 33%
Problem validation and formulation (2) • A three-class task seems like one that most people would do quite well at. • For balance issue, reduce their problem from 5-class to 4-class
A scale dataset • Movie reviews from four corpora • Remove rating indicators • Remove objective sentences • A total of 1,770, 902, 1,307, 1,027 documents of four authors
Algorithm (1) • Using SVMlight package • Algorithm 1: One-vs-all (OVA) • An SVM binary classifier distinguishing label l to label not-l • Algorithm 2: Regression • Find the hyperplane best fits the training data (within distance epsilon incur no loss) • Similar items, similar labels
Algorithm (2) • Algorithm 3: Metric labeling • Algorithm 1 or 2 + Similarity measure • Distance metric on labels • K nearest neighbors of item x according to sim • Item-similarity function sim • Locally-weighted learning
Algorithm (3) • Finding a label-correlated item-similarity function: vocabulary overlap (ex. Cosine) is not suitable.
Algorithm (PSP) • Using PSP (positive-sentence percentage) • A NB classifier trained on 10,062 movie-review snippets (exact one sentence long striking) • Apply this classifier on their test data
Algorithm (PSP) = Distinguish terms: appear more than 20 times and appear in a single class 50% or more
Experiment Results (2) • Adding PSP is useful, however, PSP it self is not good enough.
Multi-authors • Get comparable results
Future Work • Varying the kernel in SVM • Use mixture models (combine “positive” and “negative” language models) to capture class relationships. • Multi-class but no-scale-based categorization problem (positive vs. negative vs. neutral) • Transductive setting (a small amount of labeled data and uses relationships between unlabeled items), well-suited to the metric-labeling approach