950 likes | 1.09k Views
Modeling and Exploiting Review Helpfulness for Summarization. Diane Litman Professor , Computer Science Department Senior Scientist, Learning Research & Development Center Co-Director , Intelligent Systems Program University of Pittsburgh
E N D
Modeling and Exploiting Review Helpfulness for Summarization Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development Center Co-Director, Intelligent Systems Program University of Pittsburgh Joint work with WentingXiong, Computer Science (PhD Dissertation; now at IBM)
Online reviews • Online reviews are influential in customer decision-making
Online peer reviews • Student peer reviews have been used for grading assignments in Massive Open Online Courses (MOOCs) • Online peer-review software • E.g. SWoRD Developed at the University of Pittsburgh
While reviews thrive on the internet… Overwhelming!
While reviews thrive on the internet… Overwhelming! Mixed quality!
Review metadata includes user-provided quality assessments (e.g., helpfulness votes)
Review metadata includes user-provided quality assessments (e.g., helpfulness votes) Research Problem 1: What if helpfulness metadata is not available?
Helpfulness metadata, in turn, has been used to facilitate review exploration
Helpfulness metadata has been used to facilitate review exploration Research Problem 2: What about helpfulness for summarization?
Outline • Introduction • Challenges for NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions
Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews
Product review examples More helpful review Personal experience Product support Less helpful review Comparison with iPad
Peer review examples Criticism • Expert-rated helpfulness = 5 • I thought there were some good opportunities to provide further data to strengthen your argument. For example the statement “These methods of intimidation, and the lack of military force offered by the government to stop the KKK, led to the rescinding of African American democracy.” Maybe here include data about how… • (omit 126 words) Solution Problem localization • Expert-rated helpfulness = 2 • The author also has great logic in this paper. How can we consider the United States a great democracy when everyone is not treated equal. All of the main points were indeed supported in this piece. Praise • Problem localization and solutions are significantly correlated with the likelihood of feedback implementation <Nelson and Schunn 2009>
Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot
Review content from multiple sources The external content is highlighted in green • Product reviews The Nikon D3100 is a very good entry-level digital SLR. Clearly targeted toward the beginner, its combination of Guide Modes, assist images, and help screens easily makes it the most accessible of any D-SLR out there.
Review content from multiple sources The external content is highlighted in green • Movie reviews • Peer reviews …Schultz tells Django to pick out whatever he likes. Django looks at the smiling white man in disbelief. You’re gonna let me pick out my own clothes? Django can’t believe it. The following shot delivered one of the biggest laughs from the audience I watched the film with. … The paragraph about Abraham Lincoln's actions towards the former slaves is not clear. Which social and political reforms were not made quickly by Lincoln? It may well be true that Lincoln did not accomplish everything he intended before his assassination, but this sentence is too vague to know whether the writer is historically accurate.
Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level
Identifying review helpfulness in fine-granularity • An example I really like this camera.It has 10x optical, image stabilization, a 3.0inch lcd with 230,000 pixels, and more. The size is great for a 10x zoom camera.Image stabilization and is great for getting shots that would come out blurry with my Canon Powershot A620.My other favorite feature besides the zoom and image stabilization, is the wide angle.It is great to finally get cityscapes and have the whole skyline in one shot!!And with the camera set to 16X9, I can get a 24mm shot!
Identifying review helpfulness in fine-granularity • Sentence-level review helpfulness prediction
Identifying review helpfulness in fine-granularity • Highlight the most helpful sentences
Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level • Existing summarization heuristics are not designed for reviews • E.g. Similarity of word distributions
Challenges for NLP • The definition of review helpfulness varies • E.g. Educational aspects of peer reviews • Review content may have multiple sources • E.g. A description of movie plot • User helpfulness ratings are not at a fine-granularity • E.g. At the paragraph rather than the sentence level • Existing summarization heuristics are not designed for reviews • E.g. Similarity of word distributions • Specialized subject pools are needed for user studies • E.g. Students or teachers for peer reviews
Research questions • Can we model review helpfulness based on review textual content automatically? • Can we improve summarization performance by introducing review helpfulness?
Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions
Automatically assessing peer-review helpfulness Our approach – Adaptation • From product reviews <Kim et al 2006> to peer reviews • Introduce peer-review domain knowledge
Annotated peer-review corpus Collected from a college level history introductory class • 22 papers and 267 reviews • Paper ratings • Review helpfulness ratings provided by experts • Prior annotations <Nelson and Schunn 2009> • Feedback types -- praise, summary, criticism Kappa = .92 • For criticisms • Localization information of the problem • pLocalization, Kappa = .69 • Concrete solution to problems • Solution, Kappa = .87 Annotation feedbackType = criticism pLocalization = True Solution = True I thought there were some good opportunities to provide further data to strengthen your argument. For example the statement “These methods of intimidation, and the lack of military force offered by the government to stop the KKK, led to the rescinding of African American democracy.” Maybe here include data about how…(omit 126 words)
Adaptation from product reviews to peer reviews • Generic features motivated by prior work on product reviews <Kim et al 2006> • Topic words are automatically extracted from students’ papers using publicly available software (by Annie Louis 2008) • Sentiment words are extracted from General Inquirer Dictionary
Introducing domain knowledge • Peer-review specialized features
Experiment 1 • Comparison • Generic features vs. peer-review specialized features • Algorithm • SVM Regression (SVMlight) • Evaluation • 10-fold cross validation • Pearson correlation coefficient r
Results – Analysis of the generic features • Most helpful features: STR • Best feature combination: STR+UGR+META
Feature redundancy effect Results – Analysis of the generic features • Most helpful features: STR • Best feature combination: STR+UGR+META • Combining all features together does not add up their predictive power
Results – Analysis of the peer-review features • Introducing peer-review specific features enhances performance • Feature redundancy effect is reduced after replacing UGR with Lexical Categories
Results – Analysis of the peer-review features • Introducing peer-review specific features enhances performance • Feature redundancy effect is reduced after replacing UGR with Lexical Categories
Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions
Modeling review helpfulness based on content patterns of multiple sources • High-level representation of review content patterns • Differentiating review content sources
Content patterns – LU Linguistic Inquiry Word Count <Pennebaker, et al. 2007> • To examine review language usage patterns
Content patterns – CD Language entropy over word distribution <Stark, et al. 2012>
Content patterns -- rRT Statistical topic modeling — sLDA <Blei et al 2007> • Introduce document information as supervision Helpfulness rating
Differentiating review content sources Feature extraction with respect to different content sources • Internal content: reviewers’ judgments • External content: reviewers’ references to the review item • Consider review external content as external topic words • Topic signature acquisition algorithm <Lin and Hovy, 2000> • Software:TopicS<Nenkova and Louis, 2008> …Schultz tells Django to pick out whatever he likes. Django looks at the smiling white man in disbelief. You’re gonna let me pick out my own clothes? Django can’t believe it. The following shot delivered one of the biggest laughs from the audience I watched the film with. …
Data • Three domains • Camera reviews • From Amazon.com<Jindal and Liu 2008> • Each camera/movie review is voted by more than 3 people • Movie reviews • Collected from IMDB.com • Educational peer reviews • <Xiong and Litman 2011> • Helpfulness gold standard • Camera/Movie reviews <Kim et al. 2006> • Peer reviews • 5-point expert ratings <Nelson and Schunn 2009>
Experiment 2 • Comparison • Content patterns (LU, CD, hRT) vs. unigram • Content patterns + others vs. unigram + others • Content sources: F, I, E, I+E • Algorithm • SVM Regression (SVMlight) • Evaluation • 10-fold cross validation • Pearson correlation coefficient r
Experiment 2 – Feature results • The proposed features outperform unigrams for movie and peer reviews • The best result is in bold • Significant improvement over baselines are noted with+ • Unigrams work best for camera reviews • Same pattern when performed down-sampling • Domain difficulty: movie > peer > camera (?)
Experiment 2 – Feature results • The proposed features outperform unigrams for movie and peer reviews • The best result is in bold • Significant improvement over baselines are noted with+ • Unigrams work best for camera reviews • Same pattern when performed down-sampling • Domain difficulty: movie > peer > camera (?)
Experiment 2 – Feature results • Content patterns + othersvs. unigram + others • Same pattern holds
Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)
Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)
Experiment 2 – Content source results • The best content source is in bold for each feature type • Significant improvement over F is in purple • Movie reviews • Peer reviews • For movie review: external > internal • For both: internal + external yields most predictive models (LU+CD+hRT)
Lessons learned • Techniques used in predicting product review helpfulness can be effectively adapted to the new peer-review domain • Prediction performance can be further improved by incorporating featuresthat capture helpfulness information specific to peer-reviews • Content features which capture review content patterns at a high-level work better than unigrams for predicting review helpfulness • Review content source also matters to modeling review helpfulness, differentiating which yields better performance
Outline • Introduction • Challenges to NLP • Review content analysis for helpfulness prediction • From customer reviews to peer reviews • A general helpfulness model based on review text • Helpfulness-guided review summarization • Human summary analysis • User studies • Conclusions