200 likes | 207 Views
This research explores the classification and evaluation of human summary production operations for computer-aided summarisation, focusing on improving coherence and readability in summary production.
E N D
From extracts to abstracts: human summary production operations for computer-aided summarisation Laura Hasler University of Wolverhampton L.Hasler@wlv.ac.uk CALP 2007: 30.09.07
Overview • Computer-aided summarisation (CAS) • Summary production stage of summarisation • Classification of human summary production operations (and guidelines) • Evaluation of classification (and guidelines derived from it) • Some conclusions and possibilities for future work Laura Hasler: CALP 2007
Computer-aided summarisation • Feasible alternative to fully automatic summarisation given problems of coherence/ readability with automatic extracts • Automatic summarisation methods produce an extract (document exploration, relevance assessment) which is then post-edited by user (summary production) • No resources to ensure consistency • Focus of this research on summary production (extract abstract) to improve coherence and readability Laura Hasler: CALP 2007
Aim of the research • Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported. B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people.(h03-ljh) Laura Hasler: CALP 2007
Classification of operations • 43 pairs of news texts (extract, abstract) • 30% extracts (CAST guidelines) 20% abstracts • 5 general classes of operations • Atomic: deletion, insertion • Complex: replacement, reordering, merging • Each split into sub-operations (26 in total) • Sub-operations linked to triggers, or recognisable surface forms • Function of units also important Laura Hasler: CALP 2007
Deletion • “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract” • Used alone or as part of complex operations • Very useful for reducing text when used alone • Deletes non-essential units (details, repetitions) • Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be Laura Hasler: CALP 2007
Deletion examples • [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh) • Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an) Laura Hasler: CALP 2007
Insertion • “The process of adding a unit which is not present in the extract into the abstract” • Used alone or as part of complex operations • Interesting because it adds text to something which is supposed to be reduced • Used to add coherence and to clarify whilst saving space • Connectives, modifiers, ‘formulaic units’, punctuation Laura Hasler: CALP 2007
Insertion examples • He seesthe need to raise public awareness and demystify science and technology asa key point… (new-sci-B7L-75-ljh)[X sees Y as Z] • The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh) Laura Hasler: CALP 2007
Replacement • “The deletion of one unit and the insertion of a different unit in the same place in the text” • Complex operation, can be used in combination with other complex operations • Useful for avoiding repetition and saving space • Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations Laura Hasler: CALP 2007
Replacement examples • [Zhanat Carr, a radiation scientist with the WHO in Geneva,]The WHO [says]admitsthe 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh) • [All this][is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.]These general difficulties of evolutionary ecologyare hardly Culver’s fault. (new-sci-B7L-63-ljh) Laura Hasler: CALP 2007
Reordering • “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract” • Complex operation, can be used in combination with other complex operations • Sub-functions rather than operations – difficult to sub-classify • Emphasises information, improves coherence and readability Laura Hasler: CALP 2007
Reordering example • Text about world’s second face transplant, all other sentences about a specific person/operation • Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh) • S2 last sentence Laura Hasler: CALP 2007
Merging • “Taking information from different units in the extract and presenting it as one unit in the abstract” • All other operations can be used • Large class, most difficult to sub-classify – anything (appropriate) goes! • Best embodies abstracting as opposed to extracting – conciseness • Restructuring of clauses/sentences, punctuation/connectives Laura Hasler: CALP 2007
Merging example • In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain[. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500 for a further stage of official examination]. (new-sci-B7K-37) Laura Hasler: CALP 2007
Evaluation • Applied guidelines to a different set of extracts • 25 human-produced extracts + corresponding abstracts • 25 automatically produced extracts + corresponding abstracts • Developed Centering Theory as an evaluation method (evaluation metric) due to unsuitability of existing evaluation methods Laura Hasler: CALP 2007
Centering Theory (CT)(Grosz, Joshi & Weinstein 1995) • Parametric theory of local coherence and salience • Accounts for coherence using repetitions of entities across consecutive utterances • Uses the relationship between repetitions to derive ‘transitions’ • Transitions are ordered in preference from most to least coherent • Metric developed to reflect the effect of transitions in summaries Laura Hasler: CALP 2007
Evaluation 2 • Human judgment obtained to complement CT • Overall, human summary production operations improve texts: CT = 78%; Judge = 82% • Agreement between CT and judge = 70% • Classification and resulting guidelines can be reliably used during post-editing in CAS • CT is useful as an evaluation method Laura Hasler: CALP 2007
Conclusions • Analysis and classification of human summary production operations for CAS ( guidelines) • Evaluation: applying these operations to extracts results in more coherent/readable abstracts • Guidelines can help CAS system users in their task Future work • To use more human summarisers/judges to further validate classification/guidelines • To look at scientific texts (also popular in AS) • To further explore CT for evaluation Laura Hasler: CALP 2007
Thank you! Any questions? Laura Hasler: CALP 2007