240 likes | 341 Views
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating. Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz. Universidad Complutense de Madrid NIL ( Natural Interaction based on Language ). Motivation. Product review forums have become commonplace
E N D
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad Complutense de Madrid NIL (Natural Interaction based on Language) Jorge Carrillo de Albornoz - ECIR 2011
Motivation • Product review forums have become commonplace • Reviews are of great interest • Companies use them to exploit their marketing-mix • Individuals are interested in others’ opinions when purchasing a product • Manual analysis is unfeasible • Typical NLP tasks: • Subjective detection • Polarity recognition • Rating inference, etc. Jorge Carrillo de Albornoz - ECIR 2011
Motivation • Traditional approaches: • Term frequencies, POS, etc. • Polar expressions • They do not take into account: • The product features on which the opinions are expressed • The relations between them Jorge Carrillo de Albornoz - ECIR 2011
Hypothesis • Humans have a conceptual model of what is relevant regarding a certain product • This model influences the polarity and strength of their opinions • It is necessary to combine feature mining and sentiment analysis strategies to • Automatically extract the important features • Quantify the strength of the opinions about such features Jorge Carrillo de Albornoz - ECIR 2011
The HotelReview Corpus • 25 reviews from 60 different hotels (1500 reviews) • Each review: • The city • The reviewer nationality • The date • The reviewer category • A score in 0-10 ranking the opinion • A free-text describing, separately, what the reviewer liked and disliked Jorge Carrillo de Albornoz - ECIR 2011
The HotelReview Corpus • No relation between the score and the text describing the user opinion: • Two annotators • Excellent, Good, Fair, Poor and Very poor • Good, Fair and Poor • After removing conflicting judgments =1000 reviews Good location. Nice roof restaurant - (I have stayed in the baglioni more than 5 times before). Maybe reshaping/redecorating the lobby. Noisy due to road traffic. The room was extremely small. Parking awkward. Shower screen was broken and there was no bulb in the bedside light. Jorge Carrillo de Albornoz - ECIR 2011
The HotelReview Corpus Download: http://nil.fdi.ucm.es/index.php?q=node/456 Jorge Carrillo de Albornoz - ECIR 2011
Automatic Product Review Rating • Step I: Detecting Salient Product Features • Identifying the features that are relevant to consumers • Step II: Extracting the User Opinion • Extracting from the review the opinions expressed on such features • Step III: Quantifying the User Opinions • Predicting the polarity of the sentences associated to each feature • Step IV: Predicting the Rating of a Review • Translatingthe product review into a Vector of Feature Intensities (VFI) Jorge Carrillo de Albornoz - ECIR 2011
Step I: Detecting Salient Product Features • Objective: Identifying the product features that are relevant to consumers • Given a set of reviews R={r1, r2, …, rn}: • The set of reviews is represented as a graph • Vertices = concepts • Edges = is a + semantic similarity relations • The concepts are ranked according to its salience and a degree-based clustering algorithm is executed • The result is a number of clusters where each cluster represent a product feature Jorge Carrillo de Albornoz - ECIR 2011
Step II: Extracting the User Opinion on Each Product Feature • Objective: Locating in the review all textual mentions related to each product feature • Mapping the reviews to WordNet concepts • Associating the sentences to feature clusters: • Most Common Feature (MCF): more WordNet concepts in common • All Common Features (ACF): every feature with some concept in common • Most Salient Feature (MSF): the sentence is associated to the highest score feature Jorge Carrillo de Albornoz - ECIR 2011
Step III: Quantifying the User Opinions • Objective: Quantifying the opinion expressed by the reviewer on the different product features • Classifying the sentences of each review into positive or negative • Any polarity classification system may be used • Our system: • Concepts rather than terms • Emotional categories • Negations and quantifiers Jorge Carrillo de Albornoz - ECIR 2011
Step IV: Predicting the Rating of a Review • Objective: Aggregate all previous information to provide an overall rating for the review • Mapping the product review to a VFI • A VFI is a vector of N+1 values representing the detected features and the other feature • Two strategies for assigning values to the VFI: • Binary Polarity (BP): the position in the VFI of the feature assigned to each sentence is increased or decreased in one according to the polarity of the sentence • Probability of Polarity (PP): the feature position is increased or decreased with the probability calculated by the classifier [-1.0, 0.0, 0.0, 0.0, …,-1.0, 0.0, 0.0,1.0, 0.0, 1.0, 0.0, …., 1.0] Jorge Carrillo de Albornoz - ECIR 2011
Experimental Setup • HotelReview corpus: 1000 reviews • Different feature sets: • Feature set 1: 50 reviews 24 feature clusters and 114 concepts • Feature set 2: 1000 reviews 18 feature clusters and 330 concepts • Feature set 3: 1500 reviews 18 feature clusters and 353 concepts • Baselines: • Carrillo de Albornoz et al. (2010) • Pang et al. (2002) Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1 • Objectives: • To examine the effect of the product feature set • To determine the best heuristic for sentence-to-feature assignment (Most Common Feature, All Common Features and Most Salient Feature) • Task: Three classes classification (Poor,Fair and Good) • We use the Binary Polarity strategy for assigning values to the VFI vector Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1 - Results Average accuracies for different classifiers, using different feature sets and sentence-to-feature assignment strategies Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1 - Discussion • Feature set 2 reports the best results for all classifiers • Accuracy differs little across different feature sets and increasing the number of reviews used for extracting the features does not always improve accuracy • This is due to the fact that users are concerned about a small set of features which are also quite consistent among users Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1 - Discussion • The Most Salient Feature (MSF) heuristic for sentence-to-feature assignment produces the best outcome • The Most Common Feature (MCF) heuristic reports very close results • But the All Common Features (ACF) one behaves significantly worse • It seems that only the main feature in each sentence provides useful information for the task Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1I • Objectives: • To check if the Probability of Polarity strategy produces better results than the Binary Polarity strategy • Test the system in a 5-classes prediction task • Tasks: • Three classes classification (Poor,Fair and Good) • Five classes classification (Very Poor, Poor, Fair, Good and Excellent) • We use the Feature set 2 and the MSF strategy for these experiments Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1I - Results Average accuracies for different classifiers in the 3-classes and 5-classes prediction task. Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1I - Discussion • The Probability of Polarity behaves significantly better than the Binary Polarity strategy • It allows to captures the degree of negativity/positivity of a sentence, not only its polarity • It is clearly not the same to say The bedcover was a bit dirty than The bedcover was terribly dirty Jorge Carrillo de Albornoz - ECIR 2011
Experiment 1I - Discussion • The results in the 5-classes prediction task are considerably lower than in the 3-classes task • This was expected: • The task is more difficult • The borderline between Poor-Very poor and Good-Excellent instances is fuzzy • Our system significantly outperforms both baselines in all tasks Jorge Carrillo de Albornoz - ECIR 2011
Conclusions and Future Work • The system performs significantly better than previous approaches • The product features have different impact on the user opinion • Users are concerned about a relatively small set of product features • The salient features can be easily obtained from a relatively small set of product reviews and without previous knowledge • Differences between the various Weka classifiers are not marked Jorge Carrillo de Albornoz - ECIR 2011
Conclusions and Future Work • Error propagation of the sentence polarity classifier • Error assigning sentences to features • Not enough information: • Dirty. Stinky. Unfriendly. Noisy • Co-reference problem: • Anyway, everybody else was nice • To evaluate the system over other domains • To translate the system to other language Jorge Carrillo de Albornoz - ECIR 2011
Thank you! Any question? Jorge Carrillo de Albornoz - ECIR 2011