310 likes | 399 Views
Deriving the Pricing Power of Product Features by Mining Consumers Reviews. Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis ------------------------------------------------------------ Class Presentation By: Arunava Bhattacharya. INDEX . Introduction
E N D
Deriving the Pricing Power of Product Features by Mining Consumers Reviews Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis ------------------------------------------------------------ Class Presentation By: Arunava Bhattacharya
INDEX • Introduction • Importance of Consumer product reviews • Opinion mining problems • Possible Solutions • Background • Proposed Model • Proposed Algorithm • Experimental Results • Related Works
Importance of consumer product reviews • Consumer product reviews has significant impact on consumer buying decisions and consumer generated product information on Internet attract more product interest than vendor information • Reasons: • More user oriented • Evaluate the product from user’s perspective • Often considered trustworthy by the customers
Opinion Mining Problems • Earlier methods failed to achieve high accuracy • Reasons: • Targeted primarily at evaluating the polarity of the review. • Review sentiments were classified as +ive or –ive by looking for occurrences of specific sentiment phrases.
Possible Solutions • Identify not only the opinions of the customers but also examine the importance of these opinions. • Capture reliably the pragmatic meaning of the customer evaluations. • E.g: Is “Good battery life” better than “nice battery life” ? • Follow a hedonic regression model in which weight of individual feature determine the overall price of a product.
Hedonic Regressions • The hedonic model assumes that differentiated goods can be described by vectors of objectively measured features. • Designed to estimate the value that different product aspects contribute to a consumer’s utility. • A backpacking tent can be decomposed to characteristics such as weight(w),capacity(c), and pole material(p).Tent utility can be given by the function u(w,c,p,..). • Weakness: Identify manually product features and measurement scales of them.
Product Feature Identification • Part of speech tagger: Identify the word is a noun or adjective. Nouns and noun phrases are popular candidates for product features. • Search for statistical patterns in the text (words and phrases that appear frequently in the review). • Hybrid Model: POS tagger is used as a preprocessing step before applying association rule mining algorithm to discover noun and noun phrases.
Mining Consumer Opinions • Feature mining technique is used to identify product features. • Algorithms extract sentences that give positive or negative opinions for a product feature. • A summary is produced using the discovered information. • Such techniques fail to the strength of the underlying evaluations.
Identifying Customer Opinions • Each n features can be expressed by a noun chosen from the set of all nouns appeared in the review. • Consumers typically use adjectives such as “Bad”, “Good”, “Amazing” to evaluate the quality. So a syntactic dependency parser is used to identify the adjectives. • Result is pairs of product features and their respective evaluations. These pairs are referred as Opinion Phrases.
Structuring the opinion phrase space I • Model multiple sets of n product features as elements of a vector space with basis f1,….,fn. This is called feature space(F). • Construct evaluations as a vector space with basis e1,e2,….,em and it is called evaluation space(E). • Review Space(R) is constructed by the tensor product of evaluation and feature space: • R=F E
Structuring the opinion phrase space II • Set of opinion phrases fi ej form a basis of review space and is called the basis (V) of review space. • Weight of the opinion phrase ‘phrase’ in review ‘rev’ for product ‘pro’ is given by: • w(phrase,rev,prod)=N(phrase,rev,prod)+s • ∑y€V (N(y,rev,prod)+s) --(1) • N(y,rev,prod)=number of occurrence s of opinion • phrase y, in r for product p • S=‘smoothing ‘ constant
Econometric model of product reviews I • Product demand can be modeled as a function of product characteristics and price: • ln(Dkt)=ak + βln(pkt)+€kt---------(2) • Dkt = Demand for product p at time t • Pkt = Price of product p at time t • β = Price elasticity • ak= Product specific constant term • Drawback: Can not evaluate seperately different product characteristics. Mixes all product feature in single term ak .
Econometric model of product reviews II • Solution: • Repalceak=α + ψ(Wkt) ---------(3) • Where α= time product invarient constant • Wkt= all opinions for product k available at • time t, including all reviews before t. • ψ=Bilinear form of features and evaluations • Ψ((Wkt)= ∑phraseєVψ(x).w(phrase,reviews t ,product k ) • = ∑i=1n ∑j=1m ψ(fiej).w((fiej ), reviews t , product k )
Econometric model of product reviews III • Using Equations 2 and 3 we can extend the linear model: • ln(Dkt)= α + βln(pkt)+ ψ(Wkt) +€kt • Drawback: Large number of parameters and require a very large training set of product reviews to estimate. • Solution: Reduce the model dimension by placing a rank constraint on the matrix ψ. In other words ψ(x) can be decomposed as a product of feature component and the evaluation component. • ψ(shots fantastic)=γ(shots)δ(fantastic)
Econometric model of product reviews IV • Using the rank 1 approximation of the tensor product fuctional we can rewrite the eqn. 3 as: • ln(Dkt)= α + β.pkt+γ T .Wkt .δ +€kt -----(4) • γ = Vector containing n elements corresponding to • weight of each product feature. • δ= Vector containing the implicit score that each • evaluation assigns to a product feature. • Decrease the total number of parameters but loss the linearity of the original model.
Algorithm: • Based on the observation that if one of the vectors γ or δ is fixed the equation becomes linear. • Steps: • 1. Set δ to a vector of initial feature weights • 2. Minimize the fit function by choosing the optimal • evaluation weights(γ) assuming that the feature • weights (δ) are fixed. • 3. Minimize the fit function by choosing the optimal • feature weights(δ) assuming that the evaluation • weights(γ) are fixed. • 4. Repeat step 2 and 3 until the algorithm converges.
Data • The data set covered “Camera & Photo” (115 products) and “Audio & Video” (127 products) from Amazon.com. • Each observation contains the collection date, the product ID, the price(with possible discounts) ,suggested retail price, the sales rank of the product and rating. • Amazon Web Services are also used to collect the full set of reviews for each product. • Each product on both category had about 20 reviews on average.
Selecting feature and Evaluation words • Steps: • 1. Used a part of speech tagger to analyze the reviews and assign a part of speech tag to each word. • 2. Selected a subset of approximately 30 nouns to use as product features. For example “Camera & Photo” category the set of features included “battery/batteries”, “screen/lcd/display” ,”software” etc. • 3. Extracted the adjectives that evaluated the selected product features by a syntactic dependency parser. • Kept the list of 30 most frequent adjectives to create the evaluation space. Words like “amazing”, ”bad”, “great” appeared here.
Experimental Setup I • Amazon.com reports the sales rank instead of product demand. • Using the following Pareto relationship convert sales rank into product demand: • ln(D)=a + b.ln(S)--------------------(5) • Where D=Unobserved product demand • S= Its observed sales rank • a>0 ,b<0 are industry specific parameters. • Include both the suggested retail price (P1) and the price on amazon.com (P2) because prices will influence product demand. • Include the review rating variable(R).
Experimental Setup II • Modify the equation (4) as the following: • ln(Skt)=α+β1 .Rkt +β2 .ln(P1kt) + β3 .ln(P2kt) + • ∑i=1m ∑ j=1n Wktij . γi. δj + єkt • = α+β.ykt + γT . Wkt . δ + єkt --------(6) • Here Wkt is the review matrix and Wktij is calculated using equation (1).
Experimental Results • After obtaining the review matrix this model can predict future sales • This model can identify the product feature weights and the evaluation scores associated with the adjectives , within the context of an electronic market.
Experimental Results • Feature and Evaluation table for “Camera & Photo” • Higher score in Evaluation table means increase in sale and therefore negative since sales rank on amazon.com is inversely proportional to demand.
Experimental Results • Partial effects for the • “Camera & Photo “ product category. • Negative sign implies decrease in sales rank and means higher sales.
Evaluation Conclusions • Results show that this model can identify the features important to the customers. • Implicit evaluation scores for each adjective can be derived. • Evaluations like “best camera”, “excellent camera”, “perfect camera” have a negative effect on demand. • Weak positive opinions like nice and decent are also evaluated in negative manner.
Related Work • The feature selection in this model is very close to the one presented by Hu and Liu (2004). • Opinion strength analysis by Popescu and Etzioni(2005). • Das and Chen’s examination on bulletin board on Yahoo which combines economic methods with text mining(2006). • Ghose and Ipeirotis ‘s work on econometric analysis(2006).