180 likes | 770 Views
Opinion Observer: Analyzing and Comparing Opinions on the Web Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation:Vinay Goel Introduction Web: excellent source of consumer opinions Online customer reviews of products Useful information to customers and product manufacturers
E N D
Opinion Observer: Analyzing and Comparing Opinions on the Web Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation:Vinay Goel
Introduction • Web: excellent source of consumer opinions • Online customer reviews of products • Useful information to customers and product manufacturers • Novel framework for analyzing and comparing customer opinions • Technique based on language pattern matching to extract product features
Technical Tasks • Identify product features that customers have expressed their opinions on • For each feature, identify whether the opinion is positive or negative • Review Format (2) - Pros, Cons and detailed review • The paper proposes a technique to identify product features from pros and cons in this format
Problem Statement • Let P={P1,P2 …Pn} be a set of products that the user is interested in • Each product Pi has a set of reviews Ri ={r1,r2 …rk} • Each review rj is a sequence of sentences rj= {sj1,sj2 …sjm}
Product Feature • A product feature f in rj is an attribute/component of the product that has been commented on in rj • If f appears in rj, explicit feature • “The battery life of this camera is too short” • If f does not appear in rj but is implied, implicit feature • “This camera is too large” (size)
Opinions and features • Opinion segment of a feature • Set of consecutive sentences that expresses a positive or negative opinion on f • “The picture quality is good, but the battery life is short” • Positive opinion set of a feature (Pset) • Set of opinion segments of f that expresses positive opinions about f from all the reviews of the product • Nset can be defined similarly
Automated opinion analysis Explicit and implicit features Synonyms Granularity of features
Extracting Product Features - Labeling • Perform POS tagging and remove digits • “<V>included<N>MB<V>is<Adj>stingy” • Replace actual feature words with [feature] • “<V>included<N>[feature]<V>is<Adj>stingy” • Use n-gram to produce shorter segments • “<V>included<N>[feature]<V>is” • “<N>[feature]<V>is<Adj>stingy” • Distinguish duplicate tags • “<N1>[feature]<N2>usage” • Perform word stemming
Rule Generation • Association Rule Mining • Only need rules that have [feature] on the right-hand-side (<N1>,<N2> --> [feature]) • Consider the sequence of items in the conditional part (left-hand-side) of each rule • Generate language patterns (<N1>[feature]<N2>)
Feature Refinement strategies • There may be a more likely feature in the sentence segment but not extracted by any pattern • “slight hum from subwoofer when not in use” • Frequent-Noun • Only a noun replaces another noun • Frequent-Term • Any type replacement
Extracting Reviews from Web Pages • Non trivial task • MDR-2 • System finds patterns from page containing reviews • System uses these patterns to extract reviews from other pages of the site
Experimental Results • Amount of time saved by Semi-automatic tagging is around 45% • Group synonyms using WordNet (52% recall and 100% precision) • Does not handle context dependent synonyms
Conclusion • Novel visual analysis system • Supervised pattern discovery method • Interactive correction of errors of the automatic system • Improve techniques, study strength of opinions