300 likes | 433 Views
Human summary production operations for computer-aided summarisation. Laura Hasler University of Wolverhampton 30 May 2007. Overview. Original contributions of my thesis Human summarisation (HS) Automatic summarisation (AS) Computer-aided summarisation (CAS)
E N D
Human summary production operations for computer-aided summarisation Laura Hasler University of Wolverhampton 30 May 2007
Overview • Original contributions of my thesis • Human summarisation (HS) • Automatic summarisation (AS) • Computer-aided summarisation (CAS) • Classification of human summary production operations • Guidelines derived from the classification • Evaluation of guidelines and classification
Original contributions • Reliable ways of creating abstracts from extracts, improving coherence/readability • Set of guidelines to annotate source texts for important information resulting in extracts for corpus of extract/abstract pairs • Corpus of extract/abstract pairs for analysis • Corpus-based classification of human summary production operations that successfully transform extracts into abstracts by improving coherence and readability
Original contributions 2 • Set of summary production guidelines derived from classification which can be issued to users of a CAS system • Development of Centering Theory (Grosz, Joshi & Weinstein 1995) as evaluation metric due to unsuitable existing methods • Evaluation of coherence and readability of abstracts produced using summary production operations therefore of guidelines and operations themselves
Human summarisation: 3 stages(Endres-Niggemeyer 1998) • Document exploration: summariser explores layout and organisation of document to identify position of important information • Relevance assessment: summariser assesses information in document to see if it is relevant to summary by recognising the theme (what it is ‘about’) • Summary production: summariser cuts and pastes relevant information from document and edits it to form a coherent summary
Automatic summarisation Extracting • Units extracted from source verbatim problems with coherence, unnecessary info • Methods can be easily used across domains • Currently more popular; CAST Abstracting • Additional knowledge can be used concepts • Not restricted to linguistic realisation of source more coherent and concise • Needs knowledge base domain dependent
Computer-aided summarisation • A feasible alternative to fully automatic summarisation given current technology – problems of coherence and readability with automatic extracts • Uses automatic summarisation methods to produce an extract (stages 1&2) then post-edited by human summariser/user (stage 3) • Focus of this research on post-editing (extract abstract) to improve coherence/readability
Aim of the research A) Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported. B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people.(h03-ljh)
How can we consistently transform extracts into abstracts? • Guidelines: available for other aspects/types of summarisation • Investigation of what exactly a human summariser does to get from an extract to an abstract (and improve coherence) • Corpus to allow analysis and classification • Set of guidelines derived from classification • Application and evaluation of classification/ guidelines to prove they work
Corpus of extract/abstract pairs • 43 pairs of news texts (extract, abstract) • Source texts manually annotated for important information - higher quality • Annotated using adapted CAST guidelines (Hasler et al. 2003): 30% extracts produced • Extracts transformed into 20% abstracts - no guidelines given
Classification of operations • 5 general classes of operations • Atomic and complex • Atomic: deletion, insertion • Complex: replacement, reordering, merging • Each split into sub-operations (26 in total) • Sub-operations linked to triggers, or recognisable surface forms • Function of units also important
Classification Atomic operations and sub-operations • Deletion: complete sentences, subordinate clauses, PPs, adverb phrases, reporting clauses, NPs, determiners, the verb be, specially formatted text, punctuation • Insertion: connectives, formulaic units, modifiers, punctuation
Classification 2 Complex operations and sub-operations • Replacement: pronominalisation, lexical substitution, NP restructuring, nominalisation, referred sentences, VPs, passivisation, abbreviations • Reordering: emphasising, coherence • Merging: clause/sentence restructuring, punctuation/connectives
Deletion • “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract” • Used alone or as part of complex operations • Very useful for reducing text when used alone • Deletes non-essential units e.g. details, repetitions • Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be
Deletion examples • [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh) • Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an) • Britain [is] among [the] front runners as tomorrow’s supercomputers take shape. (sci05done-an)
Insertion • “The process of adding a unit which is not present in the extract into the abstract” • Used alone or as part of complex operations • Interesting because it adds text to something which is supposed to be reduced • Used to add coherence and to clarify whilst saving space • Connectives, modifiers, ‘formulaic units’, punctuation
Insertion examples • He seesthe need to raise public awareness and demystify science and technology as a key point… (new-sci-B7L-75-ljh) [X sees Y as Z] • The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh)
Replacement • “The deletion of one unit and the insertion of a different one in the same place in the text” • Complex operation, can be used in combination with other complex operations • Useful for avoiding repetition and saving space • Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations
Replacement examples • [Zhanat Carr, a radiation scientist with the WHO in Geneva,]The WHO [says]admitsthe 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh) • [All this][is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.]These general difficulties of evolutionary ecologyare hardly Culver’s fault. (new-sci-B7L-63-ljh)
Reordering • “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract” • Complex operation, can be used in combination with other complex operations • Sub-functions rather than operations – difficult to sub-classify • Emphasises information, improves coherence and readability
Reordering example • Text about world’s second face transplant, all other sentences about a specific person/ operation: S2 last sentence • Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh)
Merging • “Taking information from different units in the extract and presenting them as one unit in the abstract” • All other operations can be used • Large class, most difficult to sub-classify – anything (appropriate) goes! • Best embodies abstracting as opposed to extracting – conciseness • Restructuring of clauses/sentences, punctuation/ connectives
Merging example • In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain [. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500 for a further stage of official examination].(new-sci-B7K-37)
Evaluation • Applied guidelines to a different set of extracts • 25 human-produced extracts + corresponding abstracts • 25 automatically produced extracts + corresponding abstracts • Developed Centering Theory as an evaluation method due to unsuitability of existing methods
Centering Theory (CT) (Grosz, Joshi & Weinstein 1995) • Theory of local coherence and salience • Accounts for coherence using repetitions of entities across consecutive utterances (Cfs, Cps, Cbs) • Uses the relationship between repetitions to derive ‘transitions’ (position in utterance) • Transitions are ordered in preference from most to least coherent (continue, retain, smooth shift, rough shift, no transition/no Cb)
Centering Theory: an example John[Cp] went to his favorite music store to buy a piano. He[Cp], [Cb] had frequented the store for many years. He[Cp], [Cb] was excited that he could finally buy a piano. He[Cp], [Cb] arrived just as the store was closing for the day. • Continue, continue, continue John[Cp] went to his favorite music store to buy a piano. It[Cp] was a store John[Cb] had frequented for many years. He[Cp], [Cb] was excited that he could finally buy a piano. It[Cp] was closing just as John[Cb] arrived. • Retain, continue, retain (Grosz, Joshi & Weinstein 1995: 206)
Centering Theory: a real example 1. (Everybody)[Cp] should be ready for ((Monday)'s national championship game), despite (casualties in ((Saturday night)'s NCAA semifinal battles)). no transition (indirect) 2. (Jason Terry of (Arizona))[Cp], [Cb]was injured. retain 3. “(We)[Cp] were going to put (him)[Cb] in late in (the game),” said (Arizona coach (Lute Olson)). rough shift 4. “(He)[Cp] had played a lot before (that), of course, but when (we)'re protecting (a lead), (we)[Cb] like getting (four perimeter guys) in there and (that) gives (us) (another ball handler), gives (us) (another free throw shooter).” retain 5. (Kentucky coach (Rick Pitino))[Cp] predicted that ((Monday)'s championship game) would be also be physical, in view of (((Kentucky)'s all-out pressure defence) and ((Arizona)[Cb]'s blazing speed)).
Evaluation 2 • Human judgment obtained to complement CT • Overall, human summary production operations improve texts: CT = 78%; Judge = 82% • Agreement between CT and judge = 70% • Classification and resulting guidelines can be reliably used during post-editing in CAS • CT is useful as an evaluation method
Directions for future work • To use more human summarisers/judges to further validate classification/guidelines • To further explore/improve CT for evaluation • To investigate the feasibility of automating certain elements of summary production operations for CAS • To look at scientific texts (also popular in AS)