180 likes | 198 Views
This project focuses on using a supervised classification approach to predict the ratings assigned to restaurant reviews. The reviews are collected from Yelp.com and various preprocessing techniques are applied to extract semantic information. The MaxEnt classifier is used for classification, and different variations of classifiers are evaluated. The results and future work are discussed.
E N D
A Semantic, Supervised Classification Approach to Restaurant Reviews Pavani Vantimitta
Problem definition • Reviews: important source of information on new businesses • Use semantic information in the reviews to predict the rating assigned to a review. • Use machine learning classifiers and MaxEnt classifier
Data Collection • Restaurant reviews from yelp.com of places around Palo Alto • Use “Web-harvest” a web extraction tool to convert the reviews into text files • Training data comprises of 61 restaurants and 1971 reviews, Validation data consists of 12 restaurants and 361 reviews , Test data comprises of 10 restaurants with 260 reviews.
Preprocessing • Removing multiple spaces between words, sentences, multiple punctuation marks • Inserting a space between a punctuation mark and the preceding word • The final data collected contains
Part-of-Speech Tagging • Stanford POS tagger
Semantic information • Extracting tags from words enables us to understand to some extent the tone of the review • Aim to use only adjectives (words tagged as ‘JJ’) for classification
Vocabulary • Full vocabulary (all words tagged as ‘JJ’) • Vocabulary cut short by the count of words { 4,10,50,100,500 } • Vocabulary cut short by comparing words appearing in different rating reviews • Stemming – Lovins Stemmer and Iterated Lovins Stemmer
Variations in classification • V1 : Each rating class as a different class • V2 : Rating one as a class and rating five as class • V3 : Rating 1,2,3 as a class and rating 4,5 as a class
MaxEnt Classifier: Variation 3: Best features set has 33 features
Future Work • Sentence Boundary • Incorporate N-gram models • Predicting review rating for each sentence in a review and then averaging the results for the full review. Takes into account conflicting tones.