180 likes | 343 Views
A Joint Model of Text and Aspect Ratings for Sentiment Summarization. Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008. Introduction. An example of an aspect-based summary Q1: Aspect identification and mention extraction (coarse or fine?) Q2: sentiment classification.
E N D
A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008
Introduction • An example of an aspect-based summary • Q1: Aspect identification and mention extraction (coarse or fine?) • Q2: sentiment classification
Assumptions for their model • Ratable aspects normally represent coherent topics which can be potentially discovered from co-occurrence information in the text. • Most predictive features of an aspect rating are features derived from the text segments discussing the corresponding aspect.
Multi-Aspect Sentiment model (MAS) • This model consists of two pars: • Multi-Grain Latent Dirichlet Allocation (Titov and McDonald, 2008) : build topics • A set of sentiment predictors : force specific topics correlated with a particular aspect.
MG-LDA (1) • An extension of LDA (Latent Dirichlet Allocation): build topics that globally classify terms into product instances. (Creative Labs Mp3 players versus iPods, New York versus Paris Hotels) • MG-LDA models global topics and local topics. • The distribution of global topics is fixed for a document, while the distribution of local topics is allowed to vary across the document.
MG-LDA (2) • Ratable aspects will be captured by local topics and global topics will capture properties of reviewed items. • Example: “. . . public transport in London is straightforward, the tube station is about an 8 minute walk . . . or you can get a bus for £1.50” • A mixture of topic London (London, tube, £) • The ratable aspect location (transport, walk, bus) • Local topics are reused between very different types of items.
MG-LDA (3) • A doc is represented as a set of sliding windows, each covering T adjacent sentences. • Each window v in doc d has an associated distribution over local topics and a distribution defining preference for local topics versus global topics A word can be sampled using any window covering its sentence s, where the window is chosen according to a categorical distribution • Windows overlap permits the model to exploit a larger co-occurrence domain. • Symmetrical Dirichlet prior for
Dirichlet distribution: Dir(α) • Its probability density function returns the belief that the probabilities of K rival events are xi given that each event has been observed αi - 1 times. • Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).
Multi-Aspect Sentiment Model (1) • Assumption: the text of the review discussing an aspect is predictive of its rating. • MAS introduces a classifier for each aspect, which is used to predict its rating. • Only words assigned to that topic can participate in the prediction of the sentiment rating of the aspect. • However, rating for different aspects can be correlated. Ex. Negative cleanliness -> rooms, service, dining.
Multi-Aspect Sentiment Model (2) • Opinions about an item in general without referring to any particular aspect. Ex. This product is the worst I have ever purchased -> low ratings for every aspect. • Based on overall sentiment rating and compute corrections. • N-gram model:
Inference in MAS • Gibbs sampling • Appears only if ratings are known
Experiments - Corpus • Reviews of hotels from TripAdvisor.com. • 10,000 reviews (109,024 sentences, 2,145,313 words in total) • Every review was rated with at least 3 aspects: service, location, and rooms. • Ratings from 1 to 5.
Evaluation • 779 random sentences labeled with one or more aspects. • 164, 176, 263 sentences for service, location, and rooms, respectively.