410 likes | 586 Views
TEXT SUMMARIZERS. research and development on the automated creation of summaries of one or more texts. Rohit Yaduvanshi Anurag Meena Yogendra Singh Dabi. Overview. principal approaches in summarization , describes the design, implementation , s ummary generation and
E N D
TEXT SUMMARIZERS research and development on the automated creation of summaries of one or more texts Rohit Yaduvanshi Anurag Meena Yogendra Singh Dabi
Overview • principal approaches in summarization, • describes the design, implementation, • summary generation and • reviews methods of evaluating summaries. • Lexrank and Textrank • Conclusion
Summary?? Asummary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s).
Types • indicative summaries (that provide an idea of what the text is about without giving any content) and • informative ones (that do provide some shortened version of the content) are often referenced. • Extracts are summaries created by reusing portions (words, sentences, etc.) of the input text verbatim, while • abstracts are created by re-generating the extracted content.
Stages of Automated Summarization • topic identification, identifies the most important unit(s) (words, sentences, paragraphs, etc.). • interpretation: fusion of concepts, evaluation, and other processing. • Summary generation, The results of interpretation are usually unreadable abstract representations. Systems therefore include a stage of summary generation to produce human-readable text.
Recall and Precision Given an input text, a human’s extract, and a system’s extract, these scores quantify how closely the system’s extract corresponds to the human’s. • correct = the number of sentences extracted by the system and the human; • wrong = the number of sentences extracted by the system but not by the human; and • missed = the number of sentences extracted by the human but not by the system. Precision = correct / (correct + wrong) Recall = correct / (correct + missed)
Criteria to assign scores • Positional criteria. certain locations of the text (headings, titles, first paragraphs, etc.) tend to contain important information. • Cue phrase indicator criteria. Since in some genres certain words and phrases (‘significant’, ‘in this paper we show’) explicitly signal importance, sentences containing them should be extracted.
Criteria to assign scores • Word and phrase frequency criteria. if a text contains some words unusually frequently, then sentences containing these words are probably important. • Query and title overlap criteria. • Combination of various module scores.
Stage 2 - Interpretation or topic fusion • During interpretation, the topic identified as important are fused, represented in new terms, and expressed using a new formulation, using concept or words not found in original text. • Interpretation is what distinguishes extract type summarization from abstract type systems. • No system can perform interpretation without prior knowledge about the domain. • But acquiring enough prior domain knowledge is so difficult that summarizer to date have only attempted it in a small way.
Approach • Hovy and Lin(1999) use topic signatures- sets of words and relative strengths of association, each set related to a single headword- to perform topic fusion. • By automatically constructing these signatures , they overcome the knowledge paucity problem. • They use these topic signatures both during topic identification. The effectiveness of signatures to perform interpretation has not yet been shown.
Stage 3 - Generation • Result of abstraction are unreadable • Extracts are seldom coherent • Repetition of material • Omitted discourse linkages. • So there is a need of summary generation to produce a human readable text
Text Planning • The facts must be organized so as to signal the causal, logical and intentional relationships between them. • The facts must be organized so as to signal the causal, logical and intentional relationships between them.
Example: • The system performs the enhancement. Before that, the system resolves conflicts. First, the system asks the user to tell it the characteristic of the program to be enhanced. The system applies transformations to the program. It confirms the enhancement with the user. It scans the program in order to find opportunities to apply transformations to the program. • The system asks the user to tell it the characteristic of the program to be enhanced. Then the system applies transformations to the program. In particular, the system scans the program in order to find opportunities to apply transformations to the program. Then the system resolves conflicts. It confirms the enhancement with the user. Finally, it performs the enhancement.
Sentence Planning • Generate semantic and syntactic specifications that can actually be realized in natural language. a. This sentence contains two noun phrases. b. It contains them. Sentence Realization • This is a purely linguistic level, which takes choices about words and syntactic structures made during sentence planning, and constructs a sentence using them.
Dysfluencies: • Repetition of clauses • Repetition of named entities • Inclusion of less important material
Approaches: 1) • Degree of lexical connectedness between potential passages and the remainder of the text. • Connectedness may be measured by the number of shared words. 2) Knight and Marcu • The EM algorithm is used to find the maximum likelihood parameters of a statistical model. • Ultimately, this approach can likely be used for shortening two sentences into one, three into two (or one), and so on. 3)Jing and McKeown • They train a hidden Markov model to identify where in the document each (fragment of each) summary sentence resides.
Multi Document Summarization • Three Major Problems • Recognizing and coping with redundancy. • Identifying important differences among documents. • Ensuring summary coherence
SUMMONS: • Takes an information extraction approach • All input documents are parsed into templates • clusters the templates according to their contents, and then applies rules to extract items of major import. • Uses predefined rules. • Systems measure the similarity of a candidate passage to that of already-selected passages and retain it only if it contains enough new (dissimilar) information.
Determination of additional material • Identify the units most relevant to the user’s query • MMR Technique • It selects the most relevant sentences at the same time avoiding redundancy. In extractive summarization, the final score of a given sentence Si in MMR is calculated as follows: MMR(Si) = λ×Sim1(Si, D)−(1−λ)×Sim2(Si, Summ)
Previous Evaluation Studies Two aproaches: • intrinsic evaluations as measuring output quality (only) and • extrinsic as measuring user assistance in task performance.
Intrinsic Evaluations • Create a set of ideal summaries, one for each test text, and then • Compare the summarizer’s output to it, measuring content overlap • often by sentence or phrase recall and precision, but sometimes by simple word overlap.
Intrinsic (cont..) OR rate systems’ summaries according to some scale • Readability • Informativeness • Fluency • coverage
Extrinsic Evaluation • easy to motivate • but have to ensure that the metric applied correlates well with task performance efficiency. • Examples of extrinsic evaluation can be found in • Morris, Kasper, and Adams (1992) for GMAT testing, • Miike et al. (1994) for news analysis, and • Mani and Bloedorn (1997) for information retrieval.
TIPSTER-SUMMAC • largest extrinsic evaluation study to date. • By Firmin Hand and Sundheim1998; Firmin and Chrzanowski1999). • Two main extrinsic evaluation tasks were defined • adhoc task, the focus was on indicative summaries which were tailored to a particular topic. • categorization task, the evaluation sought to find out whether a generic summary could effectively present enough information to allow an analyst to quickly and correctly categorize a document.
Results of study • Donaway, Drummey, and Mather (2000) showed how summaries receive different scores with different measures, or when compared to different ideal summaries. • Jing et al. (1998) compare several evaluation methods, intrinsic and extrinsic, on the same extracts. • With regard to summary length, they find great variation. • find fairly high consistency in the news genre, as long as the summary (extract) length is fixed as relatively short.
Two basic measures In general, to be a summary, the summary must obey two requirements: • it must be shorter than the original input text; • it must contain the important information of the two measures to capture the extent to which a summary S conforms to these requirements with regard to a text T: • Compression Ratio: CR = (length S) / (length T) • Retention Ratio: RR = (info in S) / (info in T)
Fig (a) : as the summary gets longer , it includes more information, until it equals the original. • Fig. (b): shows a more desirable situation: at some special point, the addition of just a little more text to the summary adds a disproportionately large amount of information. • Fig. (c) shows another: quite early, most of the important material is included in the summary; as the length grows, the added material is less interesting.
Measuring information content • The Expert Game. Ask experts to underline and extract the most interesting or informative fragments of the text. Measure recall and precision of the system’s summary against the human’s extract. • The Question Game. This measure approximates the information content of S by determining how well it allows readers to answer questions drawn up about T.
Approaches: Lexrank and Textrank In both LexRank and TextRank, a graph is constructed with • vertex as each sentence in the document. • The edges between sentences are based on some form of semantic similarity or content overlap.
How are edges formed?? • While LexRank uses cosine similarity of TF-IDFvectors, • TextRank uses a very similar measure based on the number of words two sentences have in common. where tfw,s is the number of occurrences of the word w in the sentences.
In the unweighted edges version, edges were formed between the sentences having similarity greater than threshold. • While in the weighted edges graph, similarity scores were used as weights.
How are summaries formed? In both algorithms, the sentences are ranked by applying PageRank to the resulting graph. A summary is formed by combining the top ranking sentences, using a threshold or length cutoff to limit the size of the summary. Pagerankeqn: d is damping factor, typically chosen from [0.1,0.2]
Differences between lexrank and textrank • TextRankwas applied to summarization exactly as described here, while LexRankcombines the LexRank score with other features like sentence position and length. • TextRank was used for single document summarization, while LexRank has been applied to multi-document summarization.
When summarizing multiple documents, there is a greater risk of selecting duplicate or highly redundant sentences to place in the same summary. • To address this issue, LexRankbuilds up a summary by adding sentences in rank order, but discards any sentences that are too similar to ones already placed in the summary. • The method used is called Cross-Sentence Information Subsumption (CSIS).
Conclusion • We see a lot of approaches in both designing and assessment of summaries, • But researchers have found that no single method of scoring performs as well as humans do to create extracts. • Plus no one seems to know exactly what a ‘correct’ summary is.
References • Hovy, E.H. 2005. Automated Text Summarization. In R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. • Erkanand Dragomir R. Radev 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization, University of Michigan. • TIPSTER-SUMMAC. http://www-nlpir.nist.gov/related_projects/tipster_summac/index.html • Automatic Summarization. http://en.wikipedia.org/wiki/Automatic_summarization