500 likes | 612 Views
THE ANDREW W. MELLON FOUNDATION. Stylistics in Customer Reviews of Cultural Objects. Xiao Hu, J. Stephen Downie The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign. Agenda. Motivation
E N D
THE ANDREW W. MELLON FOUNDATION Stylistics in Customer Reviews of Cultural Objects Xiao Hu, J. Stephen Downie The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign
Agenda • Motivation • Customer reviews in epinions.com • Experiments • Genre classification • Rating classification • Usage classification • Feature studies • Conclusions & Future Work
Motivation • Online customer reviews on culture objects: • User-generated user-centered retrieval • Detailed descriptions contextual info. • Large amount rich resource • Self-organized ground truth • Text mining: • Mature techniques and Handy tools • Review mining: a place to play Stylistics Text Analysis!
Description 1 Description 1 Description 1 D1 D2 D3 D1 D2 D3 Description 1 Description 1 Description 1 Motivation Classify Reviews Identify User Descriptions Connect to Objects Class 1 Customer Reviews Genres Ratings Usages User-centered access points Prominent Features Class 2 Epinions.com Amazon.com …..
Customer Reviews • Published on www.epinions.com • Focused on the book, movie and music • Each review associated with: • a genre label • a numerical quality rating • a recommended usage (for music reviews)
numerical rating associated full text, to be analyzed recommended usage
28 Major Genre Categories Jazz, Rock, Country, Classical, Blues, Gospel, Punk, .… Renaissance, Medieval, Baroque, Romantic, … Genre Taxonomy (music)
Experiments • to build and evaluate a prototype system that could automatically : • predict thegenre of the work being reviewed • predict thequality rating assigned to the reviewed item • predict theusage recommended by the reviewer • discover distinctive features contributing to each of the above
Models and Methods • Prediction problem: • Naïve Bayesian (NB) Classifier • Computationally efficient • Empirically effective • Hierarchical clustering (for usage prediction only) • Feature analysis: • Frequent pattern mining • Naïve Bayesian feature ranking
Data Preprocessing • HTML tags were stripped out; • Stop words were NOT stripped out; • Punctuation was NOT stripped out; • They may contain stylistic information • Tokens were stemmed
Genre Classifications • Data set
Genre Classification Results 5 fold random cross validation for book and movie reviews 3 fold random cross validation for music reviews
Rating Classification • Five-class classification • 1 star vs. 2 stars vs. 3 stars vs. 4 stars vs 5 stars • Binary Group classification • 1 star + 2 stars vs. 4 stars + 5 stars • ad extremis classification • 1 star vs. 5 stars 5 fold random cross validation for Book and Movie review experiments 5 fold cross validation for Music review experiments
Usage Classification • Each music review has one usage suggested by the reviewer • It can be chosen from a ready-made list of 13 usages • Chose the most popular 11 usages for experiments
Data and initial result 10 fold cross validation
Usage super-classes • Frequent confusions: a measure of similarity • Hierarchical clustering based on the confusion matrix
R1 Relaxing R2 Stimulating S1 S2 Hierarchical clustering Going to sleep Listening Reading or studying Romancing Cleaning the house At work Hanging out with friends Getting ready to go out Driving Waking up Exercising
Classifications on usage super-classes 10 fold cross validation
Feature studies • What makes the classes distinguishable? • What are important features? • How important are they? • Two techniques applied • Frequent Pattern Mining • Naïve Bayesian Feature Ranking • Focus on music reviews
Items Transactions Frequent Pattern Mining (FPM) • Originally used to discover association rules • Finds patterns consisting of items that frequently occur together in individual transactions • Items =candidate words (terms) depending on specific questions • Transactions = review sentences
Positive and negative descriptive patterns • Recall: rating classification on music reviews
Positive and negative descriptive patterns Mining frequent descriptive patterns in positive and negative reviews adjectives, adverbs and verbs, negatives no nouns, no stopwords
Single term patterns Good = Bad?! Digging deeper ----
good in a negative context Negation:“Nothing is good.” “It just doesn't sound good.” Song titles: “Good Charlotte, you make me so mad.” “Feels So Goodis dated and reprehensibly bad.” Rhetoric: “And this is a good ruiner: …” “What a waste of my good two dollars…” Faint praise: “…the only good thing… is the packaging.” Expressions: “You all have heard … the good old cliché.”
Double term patterns Good Bad?! Digging deeper and deeper --
Noun patterns in genre classification • Recall: genre classification on music reviews
Noun patterns in genre classification • Studied four popular genres • Only nouns considered
> 0, di is in Cj < 0, di is not in Cj Naïve Bayesian Feature Ranking (NBFR) • Based on NB text categorization model Prediction in binary classification cases:
Features in usage super-classes • Recall: classification on usage super-classes
Top-ranked terms in super-classes Terms in ()’s were manually added for clarity
Artist-usage relationship • Binomial exact test on artists with >10 reviews (p < 0.05)
Data Preprocessing NB Classifier Implementation & T2K (demo) • Text-to-Knowledge (T2K) Toolkit A text mining framework Ready-to-use modules and itineraries Natural Language Processing tools integrated Supporting fast prototyping of text mining
Conclusions • Text analysis of user-generated reviews on culture objects • NB on genre, rating, and usage classification • Feature studies: FPM and NBFR • Customer reviews are good resources for connecting users’ opinions to cultural objects and thus facilitating information access via novel, user-oriented facets.
Future work • More text mining techniques • Other critical text • blogs, wikis, etc • Feature studies • other kinds of features
THE ANDREW W. MELLON FOUNDATION Questions? IMIRSEL Thank you!