190 likes | 381 Views
Movie Review Mining and Summarization. Li Zhuang, Feng Jing, and Xiao-Yan Zhu ACM CIKM 2006. Speaker: Yu-Jiun Liu Date : 2007/01/10. Outline. Introduction The characteristic of movie review mining Definition Approach Experiment. Introduction.
E N D
Movie Review Mining and Summarization Li Zhuang, Feng Jing, and Xiao-Yan Zhu ACM CIKM 2006 Speaker: Yu-Jiun Liu Date : 2007/01/10
Outline • Introduction • The characteristic of movie review mining • Definition • Approach • Experiment
Introduction • Review is useful for both information promulgators and readers. • However, many reviews are lengthy with only few sentences expressing the author’s opinions. • Automatically generate the summary of reviews. • Product Review v.s. Movie Review
The characteristic of movie review mining • The promulgators probably comment more other movie-related elements. • The reader probably wants more. • Movie review must generate richer summary than product review. • A multi-knowledge based approach.
Definition 1 • Movie Feature • A movie feature is a movie element or a movie-related people that has been commented on. • According to IMDB, feature classes are divided into two groups: ELEMENT and PEOPLE. • ELEMENT: OA, ST (screenplay), SE (specialeffects)…etc. • PEOPLE: PPR, PDR, PAC…etc. • Example: “story”, “script”, and “screenplay” belong to ST class; “actor”, “actress”, and “supporting cast” belong to PAC class.
Definition 2 • Relevant Opinion of A Feature • The relevant opinion of a feature is a set of words or phrases that expresses a positive (PRO) or negative (CON) opinion on the feature. • The polarity of a same opinion word may vary in different domain. • Example: “predictable” is neutral in product review; sounds negative in movie review.
Definition 3 • Feature-Opinion Pair • A feature-opinion pair consists of a feature and a relevant opinion. • An explicit F-O pair : both the feature and the opinion appear in sentence. • Example: “The movie is excellent.” • An implicit F-O pair : the feature or the opinion does not appear in sentence. • Example: “When I watched this film, I hoped it ended as soon as possible.” (no opinion word)
Keyword list generation • Build a keyword list to capture main feature/opinion words in movie reviews. • Divide the list into two classes: features and opinions.
Feature Keywords • The words converge. • Special parts: People Name (multi-format) (ex: Liu Yu Jiun ; Liu Y.J. ; L. Y. Jiun … etc)
Opinion Keywords • Not only use the statistical results. • The first 100 positive/negative words are selected as seed. • For each substantive in WordNet, search it in WordNet for the synsets of its first two meanings. If one of the seed words is in the synsets, the substantive is added to the opinion word list. • Remained opinion words with high frequency are added as domain specific words.
Mining Explicit F-O Pairs • In a sentence, use keyword list to find all feature/opinion words. • Use dependency grammar graph to detect the path between each feature word and each opinion word. • Stanford Parser (http://www-nlp.stanford.edu/software/lex-parser.shtml)
Mining Explicit F-O Pairs II • Example: “This movie is a masterpiece.” • Path: “movie (NN) – nsubj – is (VBZ) – dobj – masterpiece (NN)”
Mining Implicit F-O Pairs • This problem is difficult, so only deal with two simple cases with opinion words appearing. • Very short sentences that appear at the beginning or ending of a review and contain obvious opinion words. • Ex: “Great!” “movie-great” or “film-great” • Specific mapping from opinion word to feature word.
Summary Generation • Collect all the sentences that express opinions on a feature class. • The semantic orientation of the relevant opinion in each sentence is identified. • List the organized sentence as the summary.
Experiments • Performance measure
Data • Select 11 movies from the top 250 list of IMDB. • For each movie, the first 100 reviews are downloaded. • Totally more than 16,000 sentences and more than 260,000 words. • Four movie fans were asked to label f-o pairs, and give the classes of feature word and opinion word respectively.
Results • Use 880 reviews as training data, and 220 reviews as testing data.