200 likes | 373 Views
Collaborating Discourse for Text Summarization Author Laura Alonso Alemany & Maria Fuentes Fort. Present by : Mr.Thana Sukvaree NAiST laboratory, 28 Nov,2003. Objective. To present an initial collaboration Lexical -Chains and Rhetorical Structure for Text Summarization ( TS ) processing .
E N D
Collaborating Discourse for Text SummarizationAuthorLaura Alonso Alemany&Maria Fuentes Fort Present by : Mr.Thana Sukvaree NAiST laboratory, 28 Nov,2003
Objective • To present an initial collaboration Lexical -Chains and Rhetorical Structure for TextSummarization ( TS ) processing . • To improve the accuracy with this method. Collaborating Discourse for Text Summarization
Outline : • Introduction (What is TS ?) • Background knowledge • Proposed Summarization Systems • Experiment & Results • Conclusions & Future Work • Suggestion & Discussion Collaborating Discourse for Text Summarization
Text Summarization (TS) is the process of identifying salient concepts in text narrative,conceptualizing the relationships that exist among them and generating conciserepresentations of the input text that preserve the gist of its content. ( Radev,2000) Introduction • Two methodologies approach for TS ~Statistical-based : extract the heuristic information form text … ~ Knowledge-based : linguistic information combine the suitable forms such as lexical chains , discourse structure tree, .. Collaborating Discourse for Text Summarization
( Introduction ) Text Summarization Architecture Sing-DOC MULTI-DOCS QUERY 50% Very Brief Brief Headline 100% 10% ABSTRACTS Long Abstract Extract ? Indicative Informative CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS Generic Query-oriented EXTRACTS Just the news Background Generating Pre-processing Processing Collaborating Discourse for Text Summarization
What is Lexical Chains (LC) ? Background Knowledge • LC try to identify cohesionlink between parts of text by identifying relations holding between their word. • Establishes that identity chains contain terms that refer to the same object. Finding an appropriate chain ( distance & path) Select candidate word Computing score of chain Collaborating Discourse for Text Summarization
( Background Knowledge ) What is Rhetorical Structure • To obtaining a representation of textual coherence. That is , for every part of coherence text ,there is some function ,some plausible reason for its presence ,evidence to readers, and furthermore. Collaborating Discourse for Text Summarization
LC vs. RST ProposedSummarization System • LC can produce extract at different granularity levels and compression rate. • RS granularity is determined by structure of text. Collaborating Discourse for Text Summarization
( Proposed Summarizations ) Summarization by LC Procedure : • First, text is segmented ,with varying degree of granularity depending on the application. • Second, To detect chain candidate, - the text is pre-processed, annotated tagging (POS ,Name entity) - chain candidate ( Extra-strong , Strong , Medium-strong relation) is created with common nouns , proper-noun , NE ,definition noun phrase and pronoun. • Third, Chain are scored so that strong chains are identified. Sentence are ranked and those crossed by most chain are considered to be most relevant. . Outcome : A certain number of sentence is extracted from this ranked list until a determined summary length is achieved Collaborating Discourse for Text Summarization
( Proposed Summarizations ) Summarization by RST Procedure : • First, Segmenter , it identifies unambiguous minimal discursive segments • Second , it identifies coherence relationholding between minimal discursive segment by the kind of segment or the presence of discourse marker. • Third ,Discourse Marker Lexicon,containing600 cue phrases. Outcome : representation of coherence relations holding at three levels : sentence , information block (paragraph-link) , and full text . Collaborating Discourse for Text Summarization
Experiment & Result Result (News) System design Collaborating Discourse for Text Summarization
Result Lexical Chains Rhetorical Structure Collaborating Discourse for Text Summarization
( result ) Lexical Chains Collaborating Discourse for Text Summarization
( result ) Rhetorical Structure Collaborating Discourse for Text Summarization
Conclusion Future work • An initial collaboration cohesion-based and coherence-based can be successful. • Main problem is that qualitative aspects of summaries are not adequately captures by the quantitative metrics, that used in experiment. • Integrating : to be interactions between lexical-chains and rhetorical structure • Exploring further interaction between these two kind of discursive information • Assessing the improvement that techniques may introduce in summaries bt way of adequate evaluation. Collaborating Discourse for Text Summarization
Discussion • To increasing corpus size for more reliable data. • To improve the accuracy this system by using content relation of RST. • To combine statistical methods , generating the confidential value and merging the new factor in process of decision making to identifying the salience unit. • Focus on multi-document ; temporal problem , anaphora resolution , textual unit redundancy. Collaborating Discourse for Text Summarization
ATS Process & Methodology approach Informative summary Documents dimension Corpus Indicative summary Generating representation Processing(Finding Salience) Pre-Processing deep shallow NLP- Task Morphological analysis Knowledge-based Statistical-based Document Clustering (Textual relations for text understanding ) (Extraction units by heuristic info.) Annotation tagging Topic analysis Sentence boundary Discourse Structure Linear combinations Lexical Chains Ontology Collaborating Discourse for Text Summarization
Comparison table of Accuracy values between Statistical-base andKnowledge-based Methodology Collaborating Discourse for Text Summarization
Cohesion vs. Coherence“Cohesion” tries to account for relationships among the elements of a text ,four broad categories of it are identified : reference, ellipsis, conjunction and lexical cohesion. “Coherence” is represented in term of relation between text segment, such as elaboration ,cause or explanation,… Extract vs. Abstract "an Extract is a selection of some of the material of the original, while an Abstract is a condensation and reformulation of the original" (Chapter 3: Cross-lingual Information Extraction and Automated Text Summarization. Ed. E. Hovy) Collaborating Discourse for Text Summarization
Intrinsic/Extrinsic Evaluations Evaluation of summarization systems can be intrinsic or extrinsic (Jones & Galliers 1996). Intrinsic methods measure a system's quality; extrinsic methods measure a system's performance in a particular task.(cited in Jing and McKeown, 1998) Gold standard We will call the criteria of what constitutes success the gold standard, and the set of sentences that fulfill these criteria the gold standard sentences. Apart from evaluation, a gold standard is also needed for supervised learning. (cited in Teufel and Moens, 1997) In Kupiec et al. (1995), a gold standard sentence is a sentence in the source text that is matched with a summary sentence on the basis of semantic and syntactic similarity. (cited in Teufel and Moens, 1997) Collaborating Discourse for Text Summarization