230 likes | 387 Views
Mining Opinions from Reviews. Aditi S. Muralidharan Summer Intern Dept. of Computer Science, UC Berkeley. Dig. A walk-up-and use task-centered product browsing interface. (any product, not just cameras). Dig Demo. too many to read. too much to analyze. Star Ratings?. Reviews.
E N D
Mining Opinions from Reviews • Aditi S. Muralidharan • Summer Intern • Dept. of Computer Science, UC Berkeley
Dig • A walk-up-and use • task-centered • product browsing interface (any product, not just cameras)
too many to read too much to analyze Star Ratings? Reviews Customer opinions of features. My Job • Extract useful information from customer reviews Quantitatively expressed
opinion mining Reviews Customer opinions of features The tasks • Extract product features from reviews • Extract opinions about features • Show them to users (Part 2)
1.Evaluation unit e.g. newspaper article, review, product- feature. 2. Opinion units e.g sentences, phrases, adjectives. { 3. Sentiment score Opinion Mining/ Sentiment Analysis
{ { { { { sentences classifier scores score news article training set sentences n-gram bag-of-words features score Established Application: Scoring Documents How many positive articles about President Obama last week? majority voting
{ product feature referring opinions product feature score Scoring Product Features • What are product features? • “The controls are intuitive.” • “I easily figured out how to operate it”. explicit explicit easy implicit hard We focus our analysis on explicit features.
Extracting Features From Reviews Which words are product features? • INFORMATION EXTRACTION • How do people talk about known product features? • What else do they talk about that way? • Learn patterns and extract more • Computationally expensive • Precise • FREQUENCY • COUNTING • People describe product features in reviews • Therefore, frequent terms likely to be product features • Extract frequent sequences of nouns • Computationally cheap • Imprecise
hits(“camera has <feature>”) hits(<feature>) x hits(“camera has”) flash daughter vacation ... zoom lens weight ... flash controls battery ... camera has _____ the _____ of this camera it features a _____ Reviews Reviews Extraction patterns Web-PMI Candidates Extracted features Seed features Pattern-Based Information Extraction parallelized implementation takes advantage of all available resources Seed features Seed features
{ product feature referring opinions explicit product feature score Scoring Product Features
Extracting Opinions Which words are opinion words? • DEPENDENCY PARSING • Opinion words are adjectives and adverbs • Likely to be opinions if amod / nsubj/advmod relationship exists to feature mention. • Computationally expensive • neg (negation) relations are easily detected • Precise • PROXIMITY • Opinion words are adjectives and adverbs. • Likely to be opinions if they occur near a feature mention • Computationally cheap • Negation is hard to detect • Imprecise
nsubj flash controls battery ... intuitive advmod amod large Review sentence dependency parses controls Extracted features natural Extracting Opinions “The controls are intuitive.” “There are large controls on the top.” nsubj “The controls feel natural.” How to classify adjectives?
{ product feature referring opinions explicit product feature score Scoring Product Features
HITS(“camera” near adj, great) HITS(“camera” NEAR adj) x HITS(“camera” NEAR great) great + poor - excellent + terrible - ... intuitive (:) camera classifier +/- WebPMI(adj, great) = unknown adjective context training words WebPMI feature vector Web-PMI known-polarity adjectives Classifying Opinions +/- • Synonymous words have high Web-PMI with each other F1 Scores: 0.78(+) 0.76(-)
{ product feature referring opinions explicit product feature score Scoring Product Features avoid extreme estimates
fixed priors a+ a- true sentiment s true adjective polarity p Estimating Product Feature Scores • When there are few data data points, averaging gives extreme estimates • Beta-binomial smoothing model. • Estimate “true” sentiment s for each product feature. • Distribution of observed adjectives is binomial on “true” sentiment. • Added layer for classification mistakes observed polarity w (from classifier)
{ product feature referring opinions explicit product feature score Scoring Product Features avoid extreme estimates
Opinions in the UI • Main interface helps user select a set of products • Need to compare selected products • Need to compare customer opinion summaries and details
Comparison Interface • Parallel coordinates show different quantitative attributes
Customer Opinions • Red and green bars summarize the number and positivity of opinions. Adjectives appear in a list.