I256: Applied Natural Language Processing

I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006

Contents • Introduction and Applications • Types of summarization tasks • Basic paradigms • Single document summarization • Evaluation methods From lecture notes by Nachum Dershowitz & Dan Cohen

Introduction • The problem – Information overload • 4 Billion URLs indexed by Google • 200 TB of data on the Web [Lyman and Varian 03] • Information is created every day in enormous amounts • One solution – summarization • Abstracts promote current awareness • save reading time • facilitate selection • facilitate literature searches • aid in the preparation of reviews • But what is an abstract?? From lecture notes by Nachum Dershowitz & Dan Cohen

Introduction • abstract: • brief but accurate representation of the contents of a document • goal: • take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs. • compression: • the amount of text to present or the length of the summary to the length of the source. From lecture notes by Nachum Dershowitz & Dan Cohen

History • The problem has been addressed since the 50’s[Luhn 58] • Numerous methods are currently being suggested • Most methods still rely on 50’s-70’s algorithms • Problem is still hard yet there are some applications: • MS Word, • www.newsinessence.com by Drago Radev’s research group From lecture notes by Nachum Dershowitz & Dan Cohen

From lecture notes by Nachum Dershowitz & Dan Cohen

MSWord AutoSummarize From lecture notes by Nachum Dershowitz & Dan Cohen

Applications • Abstracts for Scientific and other articles • News summarization (mostly multiple document summarization) • Classification of articles and other written data • Web pages for search engines • Web access from PDAs, Cell phones • Question answering and data gathering From lecture notes by Nachum Dershowitz & Dan Cohen

Types of Summaries • Indicative vs Informative • Informative: a substitute for the entire document • Indicative: give an idea of what is there • Background • Does the reader have the needed prior knowledge? • Expert reader vs Novice reader • Query based or General • Query based – a form is being filled, answers should be answered • General – General purpose summarization From lecture notes by Nachum Dershowitz & Dan Cohen

Types ofSummaries (input) • Single document vs multiple documents • Domain specific (chemistry) or general • Genre specific (newspaper items) of general From lecture notes by Nachum Dershowitz & Dan Cohen

Types of Summaries (output) • extract vs abstract • Extracts – representative paragraphs/sentences/ phrases/words, fragments of the original text • Abstracts – a concise summary of the central subjects in the document. • Research shows that sometimes readers prefer Extracts! • language chosen for summarization • format of the resulting summary (table/paragraph/key words) From lecture notes by Nachum Dershowitz & Dan Cohen

Methods • Quantitative heuristics, manually scored • Machine-learning based statistical scoring methods • Higher semantic/syntactic structures • Network (graph) based methods • Other methods (rhetorical analysis, lexical chains, co-reference chains) • AI methods From lecture notes by Nachum Dershowitz & Dan Cohen

Quantitative Heuristics • General method: • score each entity (sentence, word) ; combine scores; choose best sentence(s) • Scoring techniques: • Word frequencies throughout the text (Luhn 58) • Position in the text (Edmunson 69, Lin&Hovy 97) • Title method (Edmunson 69) • Cue phrases in sentences (Edmunson 69) From lecture notes by Nachum Dershowitz & Dan Cohen

Using Word Frequencies (Luhn 58) • Very first work in automated summarization • Assumptions: • Frequent words indicate the topic • Frequent means with reference to the corpus frequency • Clusters of frequent words indicate summarizing sentence • Stemming based on similar prefix characters • Very common words and very rare words are ignored From lecture notes by Nachum Dershowitz & Dan Cohen

Ranked Word Frequency Zipf’s curve

Word frequencies (Luhn 58) • Find consecutive sequences of high-weight keywords • Allow a certain number of gaps of low-weight terms • Sentences with highest sum of cluster weights are chosen From lecture notes by Nachum Dershowitz & Dan Cohen

Position in the text (Edmunson 69) • Claim : Important sentences occur in specific positions • “lead-based” summary • inverse of position in document works well for the “news” • Important information occurs in specific sections of the document (introduction/conclusion) From lecture notes by Nachum Dershowitz & Dan Cohen

Title method (Edmunson 69) • Claim : title of document indicates its content • Unless editors are being cute • Not true for novels usually • What about blogs …? • words in title help find relevant content • create a list of title words, remove “stop words” • Use those as keywords in order to find important sentences (for example with Luhn’s methods) From lecture notes by Nachum Dershowitz & Dan Cohen

Cue phrases method (Edmunson 69) • Claim : Important sentences contain cue words/indicative phrases • “The main aim of the present paper is to describe…” (IND) • “The purpose of this article is to review…” (IND) • “In this report, we outline…” (IND) • “Our investigation has shown that…” (INF) • Some words are considered bonus others stigma • bonus: comparatives, superlatives, conclusive expressions, etc. • stigma: negatives, pronouns, etc. From lecture notes by Nachum Dershowitz & Dan Cohen

Feature combination (Edmundson ’69) • Linear contribution of 4 features • title, cue, keyword, position • the weights are adjusted using training data with any minimization technique • Evaluated on a corpus of 200 chemistry articles • Length ranged from 100 to 3900 words • Judges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy. • Features • Position (sensitive to types of headings for sections) • cue • title • keyword • Best results obtained with: • cue + title + position From lecture notes by Nachum Dershowitz & Dan Cohen

Statistical learning method Feature set sentence length |S| > 5 fixed phrases 26 manually chosen paragraph sentence position in paragraph thematic words binary: whether sentence is included in manual extract uppercase words not common acronyms Corpus 188 document + summary pairs from scientific journals Bayesian Classifier (Kupiec at el 95) From lecture notes by Nachum Dershowitz & Dan Cohen

Bayesian Classifier (Kupiec at el 95) • Uses Bayesian classifier: • Assuming statistical independence: From lecture notes by Nachum Dershowitz & Dan Cohen

Bayesian Classifier (Kupiec at el 95) • Each Probability is calculated empirically from a corpus • Higher probability sentences are chosed to be in the summary • Performance: • For 25% summaries, 84% precision From lecture notes by Nachum Dershowitz & Dan Cohen

Evaluation methods • When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph), 2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match), 3. measure the similarity of each unit in the new to the most similar unit(s) 4. measure Recall and Precision. • Otherwise 1. Intrinsic –how good is the summary as a summary? 2. Extrinsic – how well does the summary help the user? From lecture notes by Nachum Dershowitz & Dan Cohen

Intrinsic measures • Intrinsic measures (glass-box): how good is the summary as a summary? • Problem: how do you measure the goodness of a summary? • Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria—fluency, informativeness, coverage, etc. (Brandow et al., 95). • Summary evaluated on its own or comparing it with the source • Is the text cohesive and coherent? • Does it contain the main topics of the document? • Are important topics omitted? From lecture notes by Nachum Dershowitz & Dan Cohen

Extrinsic measures • (Black box): how well does the summary help a user with a task? • Problem: does summary quality correlate with performance? • Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98). • Evaluation in an specific task • Can the summary be used instead of the document? • Can the document be classified by reading the summary? • Can we answer questions by reading the summary? From lecture notes by Nachum Dershowitz & Dan Cohen

The Document Understanding Conference (DUC) • This is really the Text Summarization Competition • Started in 2001 • Task and Evaluation (for 2001-2004): • Various target sizes were used (10-400 words) • Both single and multiple-document summaries assessed • Summaries were manually judged for both content and readability. • Each peer (human or automatic) summary was compared against a single model summary • using SEE (http://www.isi.edu/ cyl/SEE/) • estimates the percentage of information in the model thatwas covered in the peer. • Also used ROUGE (Lin ’04) in 2004 • Recall-Oriented Understudy for Gisting Evaluation • Uses counts of n-gram overlap between candidate and gold-standard summary, assumes fixed-length summaries

The Document Understanding Conference (DUC) • Made a big change in 2005 • Extrinsic evaluation proposed but rejected (write a natural disaster summary) • Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic. • Also indicated a desired granularity of information

The Document Understanding Conference (DUC) • Evaluation metrics for new task: • Grammaticality • Non-redundancy • Referential clarity • Focus • Structure and Coherence • Responsiveness (content-based evaluation) • This was a difficult task to do well in.

Let’s make a summarizer! • Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method. • We’ll combine the parts in class.

Next Time • More on Bayesian classification • Other summarization approaches (Marcu paper) • Multi-document summarization (Goldstein et al. paper) • In-class summarizer!

I256: Applied Natural Language Processing