150 likes | 299 Views
Assessing sentence scoring techniques for extractive text summarization.
E N D
Assessing sentence scoring techniques for extractive text summarization Presenter : Jian-Ren ChenAuthors : Rafael Ferreiraa, *, Luciano De Souza Cabrala, Rafael DueireLinsa, Gabriel Pereira E Silvaa, Fred Freitasa, George D.C. Cavalcantia, RinaldoLimaa, Steven J. Simskeb, Luciano Favaroc2013.ESA
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation Due to the huge volume of information in the Internet, it has become unfeasible to efficiently sieve useful information from the huge mass of documents. Text Summarization - Extractive - Abstractive
Objectives • We want to introduce 15 sentence scoring methods and assess all of them for extractive text summarization.
Methodology – Word scoring • Word frequency • TF/IDF • Upper case • Proper noun • Word co-occurrence • Lexical similarity Score(s) = n-gram
Methodology – Sentence scoring • Cue-phrases • Sentence inclusion of numerical data • Sentence length • Sentence position • Sentence centrality • Sentence resemblance to the title Score(s) = in summary, in conclusion, our investigation the best, the most important, according to the study, significantly, important, in particular, hardly, impossible
Methodology – Graph scoring • Text rank • Bushy path of the node • Aggregate similarity Score (s) = #(branches connected to the node) Score (s) =
Experiments - Datasets、Evaluation Datasets: Evaluation: ROUGE - Quantitative Assessment - Qualitative Assessment
Experiments - CNN word scoring: TF/IDF sentence scoring: Sentence position 1 graph scoring: TextRank score
Experiments - Blog word scoring: TF/IDF sentence scoring: Sentence length graph scoring: TextRank score
Experiments - SUMMAC word scoring: TF/IDF sentence scoring: Resemblance to the title graph scoring: TextRank score
Sentence scoring results improve Morphological transformation: -Truncation、Stemming、Lemmatization Stop words Similar semantics - WordNet、Lexical Chains Co-reference - word frequency features Ambiguity -Lexical Chains Redundancy - Sentence fusion colleg*:college, colleges, collegium, collegial col*r : color, colour, colander lights: light, lights, lighting, lit be: is, am, are car, wheel, seat, passenger => automobile topic John will travel tomorrow. He bought the ticket yesterday
Conclusions • The Word Frequency, TF/IDF, Lexical Similarity, Sentence Length and Text-Rank Scorewas chosen by as providing good results. - computationally intensive: TF/IDF - balance in execution-time: Word Frequency Sentence Length
Comments • Advantages - understand the basic methods and their difference • Applications - text summarization