520 likes | 693 Views
Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA. Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept.
E N D
Multi-Document Summarization of Evaluative Text Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA EACL 2006
Multi-Document Summarization of Evaluative Text Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA EACL 2006
Motivation and Focus • Large amounts of info expressed in text formis constantly produced • News, Reports, Reviews, Blogs, Emails…. • Pressing need to summarize • Considerable work but limited factual info EACL 2006
Our Focus • Evaluative documents (good vs. bad, right vs. wrong) about a single entity • Customer reviews (e.g. Amazon.com) • Travel logs about a destination • Teaching evaluations • User studies (!) . . . EACL 2006
Our Focus • We want to do this: “The Canon G3 is a great camera. . .” Most users liked the Canon G3. Even though some did not like the menus, many . . . “Though great, the G3 has bad menus. . .” “I love the Canon G3! It . . .” EACL 2006
Two Approaches • Automatic summarizers generally produce two types of summaries: • Extracts: A representative subset of text from the original corpus • Abstracts: Generated text which contains the most relevant info from the original corpus EACL 2006
Two Approaches (cont'd) • Extracts-based summarizers generally fare better for factual summarization (c.f. DUC 2005) • But extracts aren't well suited to capturing evaluative info • Can't express distribution of opinions (‘some/all’) • Can't aggregate opinions either numerically or conceptually • So we tried both EACL 2006
Two Approaches (cont'd) • Extract-based approach (MEAD*): • Based on MEAD (Radev et al. 2003) framework for summarization • Augmented with knowledge of evaluative info (I'll explain later) • Abstract-based (SEA): • Based on GEA (Carenini & Moore, 2001) framework for generating evaluative arguments about an entity EACL 2006
Evaluative Documents Extraction of evaluative info Organization of extracted info Selection of extracted info Selection of extracted info Presentation of extracted info Pipeline Approach (for both) Shared Organization EACL 2006
Extracting evaluative info • We adopt previous work of Hu & Liu (2004) (but many others exist . . .) • Their approach extracts: • What features of the entity are evaluated • The strength and polarity of the evaluation on the [ -3 ….. +3 ] interval • Approach is (mostly) unsupervised EACL 2006
Examples • “the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera ……” • “… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .” EACL 2006
Feature Discovery • “the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera …” • “…… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .” EACL 2006
Strength/Polarity Determination • “the menus are easy to navigate(+2) and the buttons are easy to use(+2). it is a fantastic(+3) camera …” • “…… the canon computer software used to download , sort , . . . is very easy to use (+3). the only two minor issues i have with the camera are the lens cap ( it is not very snug (-2) and can come off too easily (-2))...” EACL 2006
Evaluative Documents Extraction of evaluative info Organization of extracted info Selection of extracted info Selection of extracted info Presentation of extracted info Pipeline Approach (for both) Shared Organization Partially shared EACL 2006
Organizing Extracted Info • Extraction provides a bag of features • But • features are redundant • features may range from concrete and specific (e.g. “resolution”) to abstract and general (e.g. “image”) • Solution: map features to a hierarchy [Carenini, Ng, & Zwart 2005] EACL 2006
Feature Ontology “canon” “canon g3” “digital camera” Canon G3 Digital Camera [-1,-1,+1,+2,+2,+3,+3, +3] User Interface [+1] Convenience . . . Menu Battery Buttons Lever Menus [+1] [+2,+2,+2,+3+3] Battery Life Battery Charging System [-1,-1,-2] . . . EACL 2006
Organization: SEA vs. MEAD* • SEA operates only on the hierarchical data and forgets about raw extracted features • MEAD* operates on the raw extracted features and only uses hierarchy for sentence ordering (I'll come back to this) EACL 2006
Evaluative Documents Extraction of evaluative info Organization of extracted info Selection of extracted info Selection of extracted info Presentation of extracted info Pipeline Approach (for both) Shared Organization Partially shared Not shared EACL 2006
We define a measure of importance (moi) for each feature fi in the hierarchy of features Feature Selection: SEA psk [-1,-1,+1,+2,+2,+3,+3, +3] Canon G3 Digital Camera User Interface [+1] Convenience EACL 2006
=> Dynamic greedy selection: Until desired number of features is selected • Most important node is selected • That node is removed from the tree • Importance of remaining nodes is recomputed Selection Procedure • Straightforward greedy selection would not work • if a node derives most of its importance from its child(ren) including both the node and the child(ren) would be redundant Similar to redundancy reduction step in many automatic summarization algorithms EACL 2006
Feature Selection: MEAD* • MEAD* selects sentences, not features • Calculate score for each sentence siwith the menus are easy to navigate(+2) and the buttons are easy to use(+2). feature(si) psk • Break ties with MEAD centroid (common feature in multi-document summarization) EACL 2006
menus buttons Feature Selection: MEAD* • We want to extract sentences for most important features, and only one sentence per feature • Put each sentence in “bucket” for each feature(si) I like the menus . . . the menus are easy to navigate(+2) and the buttons are easy to use(+2). EACL 2006
Feature Selection: MEAD* • Take the (single) highest scoring sentence from the “fullest” buckets until desired summary length is reached EACL 2006
Evaluative Documents Extraction of evaluative info Organization of extracted info Selection of extracted info Selection of extracted info Presentation of extracted info Pipeline Approach (for both) Shared Organization Partially shared Not shared Not shared EACL 2006
Presentation: MEAD* • Display selected sentences in order from most general (top of feature hierarchy) to most specific • That's it! EACL 2006
Presentation: SEA • SEA (Summarizer of Evaluative Arguments) is based on GEA (Generator of Evaluative Arguments) (Carenini & Moore, 2001) • GEA takes as input • a hierarchical model of features for an entity • objective values (good vs. bad) for each feature of the entity • Adaptation is (in theory) straightforward EACL 2006
Possible GEA Output The Canon G3 is a good camera. Although the interface is poor, the image qualityisexcellent. EACL 2006
Target SEA Summary Most users thought Canon G3 was a good camera. Although, several users did not like interface, almost all users liked the image quality. EACL 2006
Extra work • What GEA gives us: • High-level text plan (i.e. content selection and ordering) • Cue phrases for argumentation strategy (“In fact”, “Although”, etc.) • What GEA does not give us: • Appropriate micro-planning (lexicalization) • Need to give indication of distribution of customer opinions EACL 2006
Microplanning (incomplete!) • We generate one clausefor each selected feature • Each clause includes 3 key pieces of information: • Distribution of customers who evaluated the feature (“Many”, “most”, “some” etc.) • Name of the feature (“menus”, “image quality”, etc.) • Aggregate of opinions (“excellent”, “fair”, “poor”, etc.) • “most users found the menus to be poor” EACL 2006
Microplanning • Distribution is (roughly) based on fraction of customers who evaluated the feature (+ disagreement . . . ) • Name of the feature is straightforward • Aggregate of opinions is based on a function similar in form to the measure of importance • average polarity/strength over all evaluations rather than summing EACL 2006
Microplanning • We “glue” clauses together using cue phrases from GEA • Also perform basic aggregation EACL 2006
Formative Evaluation • Goal: test user’s perceived effectiveness • Participants: 28 ugrad students • Procedure • Pretend worked for manufacturer • Given 20 reviews (from either Camera or DVD corpus) and asked to generate summary (~100 words) for marketing dept • After 20 mins, given a summary of the 20 reviews • Asked to fill out questionnaire assessing summary effectiveness (multiple choice and open form) EACL 2006
Formative Evaluation (cont'd) • Conditions: User given one of 4 summaries • Topline summary (human) • Baseline summary (vanilla MEAD) • MEAD* summary • SEA summary EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Quantitative Results Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree) EACL 2006
Qualitative Results: MEAD* • Surprising: many participants didn't notice or didn't mind verbatim text extraction • Two major complaints about content • Summary was not representative (negative sentence extracted even though majority were positive) • Evaluations of some features were repeated • (2) could be addressed, but (1) can only partially be fixed with pure extraction EACL 2006
Qualitative Results: SEA • Some complaints about “robotic” feel of summary, and about repetition/lack of pronouns • Need to do more complex microplanning • Some wanted more details (which “manual features . . . “) • Note: this complaint absent with MEAD* • Some disagreed with feature selection (precision/recall), but this is a problem even with human summaries EACL 2006
Conclusions • Extraction works surprisingly well even for evaluative summarization • Topline > MEAD* @ SEA > Baseline • Need to combine strengths of SEA and MEAD* for evaluative summarization • Need detail, variety, and natural-sounding text provided by extraction • Need to generate opinion distributions • Need argument structure from SEA (?) EACL 2006
Other Future Work • Automatically induce feature hierarchy • Produce summaries tailored to user preferences of the evaluated entity • Summarize corpora of evaluative documents about more than one entity EACL 2006
Examples MEAD*: Bottom line , well made camera , easy to use, very flexible and powerful features to include the ability to use external flash and lense / filters choices . It has a beautiful design , lots of features, very easy to use , very configurable and customizable , and the battery duration is amazing! Great colors , pictures and white balance. The camera is a dream to operate in automode , but also gives tremendous flexibility in aperture priority , shutter priority, and manual modes . I ’d highly recommend this camera for anyone who is looking for excellent quality pictures and a combination of ease of use and the flexibility to get advanced with many options to adjust if you like. EACL 2006
Examples SEA: Almost all users loved the Canon G3 possibly because some users thought the physical appearance was very good. Furthermore, several users found the manual features and the special features to be very good. Also, some users liked the convenience because some users thought the battery was excellent. Finally, some users found the editing/viewing interface to be good despite the fact that several customers really disliked the viewfinder . However, there were some negative evaluations. Some customers thought the lens was poor even though some customers found the optical zoom capability to be excellent. Most customers thought the quality of the images was very good. EACL 2006
Examples MEAD: I am a software engineer and am very keen into technical details of everything i buy , i spend around 3 months before buying the digital camera ; and i must say , g3 worth every single cent i spent on it . I do n’t write many reviews but i ’m compelled to do so with this camera . I spent a lot of time comparing different cameras , and i realized that there is not such thing as the best digital camera . I bought my canon g3 about a month ago and i have to say i am very satisfied . EACL 2006