1 / 30

Human summary production operations for computer-aided summarisation

Human summary production operations for computer-aided summarisation. Laura Hasler University of Wolverhampton 30 May 2007. Overview. Original contributions of my thesis Human summarisation (HS) Automatic summarisation (AS) Computer-aided summarisation (CAS)

piper
Download Presentation

Human summary production operations for computer-aided summarisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human summary production operations for computer-aided summarisation Laura Hasler University of Wolverhampton 30 May 2007

  2. Overview • Original contributions of my thesis • Human summarisation (HS) • Automatic summarisation (AS) • Computer-aided summarisation (CAS) • Classification of human summary production operations • Guidelines derived from the classification • Evaluation of guidelines and classification

  3. Original contributions • Reliable ways of creating abstracts from extracts, improving coherence/readability • Set of guidelines to annotate source texts for important information resulting in extracts for corpus of extract/abstract pairs • Corpus of extract/abstract pairs for analysis • Corpus-based classification of human summary production operations that successfully transform extracts into abstracts by improving coherence and readability

  4. Original contributions 2 • Set of summary production guidelines derived from classification which can be issued to users of a CAS system • Development of Centering Theory (Grosz, Joshi & Weinstein 1995) as evaluation metric due to unsuitable existing methods • Evaluation of coherence and readability of abstracts produced using summary production operations  therefore of guidelines and operations themselves

  5. Human summarisation: 3 stages(Endres-Niggemeyer 1998) • Document exploration: summariser explores layout and organisation of document to identify position of important information • Relevance assessment: summariser assesses information in document to see if it is relevant to summary by recognising the theme (what it is ‘about’) • Summary production: summariser cuts and pastes relevant information from document and edits it to form a coherent summary

  6. Automatic summarisation Extracting • Units extracted from source verbatim  problems with coherence, unnecessary info • Methods can be easily used across domains • Currently more popular; CAST Abstracting • Additional knowledge can be used  concepts • Not restricted to linguistic realisation of source  more coherent and concise • Needs knowledge base  domain dependent

  7. Computer-aided summarisation • A feasible alternative to fully automatic summarisation given current technology – problems of coherence and readability with automatic extracts • Uses automatic summarisation methods to produce an extract (stages 1&2) then post-edited by human summariser/user (stage 3) • Focus of this research on post-editing (extract  abstract) to improve coherence/readability

  8. Aim of the research A) Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported. B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people.(h03-ljh)

  9. How can we consistently transform extracts into abstracts? • Guidelines: available for other aspects/types of summarisation • Investigation of what exactly a human summariser does to get from an extract to an abstract (and improve coherence) • Corpus to allow analysis and classification • Set of guidelines derived from classification • Application and evaluation of classification/ guidelines to prove they work

  10. Corpus of extract/abstract pairs • 43 pairs of news texts (extract, abstract) • Source texts manually annotated for important information - higher quality • Annotated using adapted CAST guidelines (Hasler et al. 2003): 30% extracts produced • Extracts transformed into 20% abstracts - no guidelines given

  11. Classification of operations • 5 general classes of operations • Atomic and complex • Atomic: deletion, insertion • Complex: replacement, reordering, merging • Each split into sub-operations (26 in total) • Sub-operations linked to triggers, or recognisable surface forms • Function of units also important

  12. Classification Atomic operations and sub-operations • Deletion: complete sentences, subordinate clauses, PPs, adverb phrases, reporting clauses, NPs, determiners, the verb be, specially formatted text, punctuation • Insertion: connectives, formulaic units, modifiers, punctuation

  13. Classification 2 Complex operations and sub-operations • Replacement: pronominalisation, lexical substitution, NP restructuring, nominalisation, referred sentences, VPs, passivisation, abbreviations • Reordering: emphasising, coherence • Merging: clause/sentence restructuring, punctuation/connectives

  14. Deletion • “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract” • Used alone or as part of complex operations • Very useful for reducing text when used alone • Deletes non-essential units e.g. details, repetitions • Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be

  15. Deletion examples • [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh) • Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an) • Britain [is] among [the] front runners as tomorrow’s supercomputers take shape. (sci05done-an)

  16. Insertion • “The process of adding a unit which is not present in the extract into the abstract” • Used alone or as part of complex operations • Interesting because it adds text to something which is supposed to be reduced • Used to add coherence and to clarify whilst saving space • Connectives, modifiers, ‘formulaic units’, punctuation

  17. Insertion examples • He seesthe need to raise public awareness and demystify science and technology as a key point… (new-sci-B7L-75-ljh) [X sees Y as Z] • The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh)

  18. Replacement • “The deletion of one unit and the insertion of a different one in the same place in the text” • Complex operation, can be used in combination with other complex operations • Useful for avoiding repetition and saving space • Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations

  19. Replacement examples • [Zhanat Carr, a radiation scientist with the WHO in Geneva,]The WHO [says]admitsthe 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh) • [All this][is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.]These general difficulties of evolutionary ecologyare hardly Culver’s fault. (new-sci-B7L-63-ljh)

  20. Reordering • “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract” • Complex operation, can be used in combination with other complex operations • Sub-functions rather than operations – difficult to sub-classify • Emphasises information, improves coherence and readability

  21. Reordering example • Text about world’s second face transplant, all other sentences about a specific person/ operation: S2  last sentence • Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh)

  22. Merging • “Taking information from different units in the extract and presenting them as one unit in the abstract” • All other operations can be used • Large class, most difficult to sub-classify – anything (appropriate) goes! • Best embodies abstracting as opposed to extracting – conciseness • Restructuring of clauses/sentences, punctuation/ connectives

  23. Merging example • In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain [. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500 for a further stage of official examination].(new-sci-B7K-37)

  24. Evaluation • Applied guidelines to a different set of extracts • 25 human-produced extracts + corresponding abstracts • 25 automatically produced extracts + corresponding abstracts • Developed Centering Theory as an evaluation method due to unsuitability of existing methods

  25. Centering Theory (CT) (Grosz, Joshi & Weinstein 1995) • Theory of local coherence and salience • Accounts for coherence using repetitions of entities across consecutive utterances (Cfs, Cps, Cbs) • Uses the relationship between repetitions to derive ‘transitions’ (position in utterance) • Transitions are ordered in preference from most to least coherent (continue, retain, smooth shift, rough shift, no transition/no Cb)

  26. Centering Theory: an example John[Cp] went to his favorite music store to buy a piano. He[Cp], [Cb] had frequented the store for many years. He[Cp], [Cb] was excited that he could finally buy a piano. He[Cp], [Cb] arrived just as the store was closing for the day. • Continue, continue, continue John[Cp] went to his favorite music store to buy a piano. It[Cp] was a store John[Cb] had frequented for many years. He[Cp], [Cb] was excited that he could finally buy a piano. It[Cp] was closing just as John[Cb] arrived. • Retain, continue, retain (Grosz, Joshi & Weinstein 1995: 206)

  27. Centering Theory: a real example 1. (Everybody)[Cp] should be ready for ((Monday)'s national championship game), despite (casualties in ((Saturday night)'s NCAA semifinal battles)). no transition (indirect) 2. (Jason Terry of (Arizona))[Cp], [Cb]was injured. retain 3. “(We)[Cp] were going to put (him)[Cb] in late in (the game),” said (Arizona coach (Lute Olson)). rough shift 4. “(He)[Cp] had played a lot before (that), of course, but when (we)'re protecting (a lead), (we)[Cb] like getting (four perimeter guys) in there and (that) gives (us) (another ball handler), gives (us) (another free throw shooter).” retain 5. (Kentucky coach (Rick Pitino))[Cp] predicted that ((Monday)'s championship game) would be also be physical, in view of (((Kentucky)'s all-out pressure defence) and ((Arizona)[Cb]'s blazing speed)).

  28. CT evaluation metric

  29. Evaluation 2 • Human judgment obtained to complement CT • Overall, human summary production operations improve texts: CT = 78%; Judge = 82% • Agreement between CT and judge = 70% • Classification and resulting guidelines can be reliably used during post-editing in CAS • CT is useful as an evaluation method

  30. Directions for future work • To use more human summarisers/judges to further validate classification/guidelines • To further explore/improve CT for evaluation • To investigate the feasibility of automating certain elements of summary production operations for CAS • To look at scientific texts (also popular in AS)

More Related