140 likes | 258 Views
Summarisation Work at Sheffield. Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield. Outline. Terminology Approach 1: Generation from Templates Approach 2: Coreference Chains Approach 3: Statistical. Terminology.
E N D
Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield
Outline • Terminology • Approach 1: Generation from Templates • Approach 2: Coreference Chains • Approach 3: Statistical AKT Workshop
Terminology • Extract vs Abstract • Extract - subset of the sentences in the original • Abstract - fusion of topics in original + text generation • Generic vs User-focused • Generic - captures essence of text, independent of user’s interests • User-focused – summarises content wrt a particular user interest • Indicative vs Informative • Indicative – indicates whether document should be examined in more detail • Informative – serves as a surrogate for original AKT Workshop
Approach 1: Generation from Templates • To generate • user-focused • informative • abstracts we have used an IE system + simple NL generation techniques to produce simple summaries AKT Workshop
Example: AWall Street Journal Article <DOC> <DOCID> wsj94_008.0212 </DOCID> <DOCNO> 940413-0062. </DOCNO> <HL> Who's News: @ Burns Fry Ltd. </HL> <DD> 04/13/94 </DD> <SO> WALL STREET JOURNAL (J), PAGE B10 </SO> <CO> MER </CO> <IN> SECURITIES (SCR) </IN> <TXT> <p> BURNS FRY Ltd. (Toronto) -- Donald Wright, 46 years old, wasnamed executive vice president and director of fixed income at thisbrokerage firm. Mr. Wright resigned as president of Merrill LynchCanada Inc., a unit of Merrill Lynch & Co., to succeed MarkKassirer, 48, who left Burns Fry last month. A Merrill Lynchspokeswoman said it hasn't named a successor to Mr. Wright, who isexpected to begin his new position by the end of the month. </p> </TXT> </DOC> AKT Workshop
Example: BNF Definition of aManagement Succession Event Template (MUC-6) <TEMPLATE> := DOC_NR: "NUMBER" ^ CONTENT: <SUCCESSION_EVENT> * <SUCCESSION_EVENT> := ORGANIZATION: <ORGANIZATION> ^ POST: "POSITION TITLE" | "no title" ^ IN_AND_OUT: <IN_AND_OUT> + VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK} ^ <IN_AND_OUT> := PERSON: <PERSON> ^ NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING} ^ ON_THE_JOB: {YES, NO, UNCLEAR} OTHER_ORG: <ORGANIZATION> - REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG} - <ORGANIZATION> := ORG_NAME: "NAME" - ORG_ALIAS: "ALIAS" * ORG_DESCRIPTOR: "DESCRIPTOR" - ORG_TYPE: {GOVERNMENT, COMPANY, OTHER} ^ ORG_LOCALE: LOCALE_STRING {{CITY, PROVINCE, COUNTRY, REGION, UNK} * ORG_COUNTRY: NORMALIZED-COUNTRY-or-REGION | COUNTRY-or-REGION-STRING * <PERSON> := PER_NAME: "NAME" - PER_ALIAS: "ALIAS" * PER_TITLE: "TITLE" * AKT Workshop
Example: A (Partially) FilledManagement Succession Event Template <TEMPLATE-9404130062> := DOC_NR: "9404130062" CONTENT: <SUCCESSION_EVENT-1> <SUCCESSION_EVENT-1> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "executive vice president" IN_AND_OUT:<IN_AND_OUT-1> <IN_AND_OUT-2> VACANCY_REASON: OTH_UNK <IN_AND_OUT-1> :=<IN_AND_OUT-2> := IO_PERSON: <PERSON-1>IO_PERSON: <PERSON-2> NEW_STATUS: OUTNEW_STATUS: IN ON_THE_JOB: NOON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-2> REL_OTHER_ORG: OUTSIDE_ORG <ORGANIZATION-1> :=<ORGANIZATION-2> := ORG_NAME: "Burns Fry Ltd.“ORG_NAME: "Merrill Lynch Canada Inc." ORG_ALIAS: "Burns Fry“ORG_ALIAS: "Merrill Lynch" ORG_DESCRIPTOR: "this brokerage firm“ORG_DESCRIPTOR: "a unit of Merrill Lynch & Co." ORG_TYPE: COMPANYORG_TYPE: COMPANY ORG_LOCALE: Toronto CITY ORG_COUNTRY: Canada <PERSON-1> := <PERSON-2> := PER_NAME: "Mark Kassirer" PER_NAME: "Donald Wright" PER_ALIAS: "Wright" PER_TITLE: "Mr." AKT Workshop
Example: One Use for a Template - Generating a Summary • From the completely filled version of the preceding template the LaSIE system generates the following natural languagesummary: BURNS FRY Ltd. named Donald Wright as executive vice president. Donald Wright resigned as presidentof Merrill Lynch Canada Inc.. Mark Kassirer left as president ofBURNS FRY Ltd. • Producing summaries in other languages is relatively easy (compared to full machine translation). AKT Workshop
Approach 2: Coreference Chains • To generate • generic • informative • extracts we have used coreference chains AKT Workshop
Approach 2: Coreference Chains (cont) • Background: • Morris and Hirst (’94) investigated lexical chains – chains of lexically-related words in a text that serve to make texts cohere • Barzilay + Elhadad (’97) suggested using lexical chains as a basis for selecting sentences to form a summary – rank chains based on number of links + extent over text • Halliday and Hassan (’76) proposed coreference as another major factor contributing to coherence of NL texts • Idea: • Explore use of coreference chains to produce summaries AKT Workshop
Approach 2: Coreference Chains (cont) • Technique • Use LaSIE to carry out discourse analysis of text, including coreference resolution • Extract all coreference chains • Rank chains by a metric which counts chain length + extent + starting point • Intuition: entities which occur most frequently and most widely in a text are those which the text is most “about” • Depending on desired summary length, select m sentences from top n chains • Details in Azzam, Humphreys and Gaizauskas ’99 AKT Workshop
Approach 3: Statistical • To generate • generic • indicative • extracts we have used a stastical approachbased on a set of factors AKT Workshop
Approach 3: Statistical (cont) • Factors which have been examined in selecting sentences for inclusion in extractive summaries include: • number of content words shared with title/headings (T) • presence of “cue words” (C) • location of sentence in text (L) • number of content words discriminative of current text as opposed to corpus of texts from which it is drawn, using, e.g. tf-idf measure (K) AKT Workshop
Approach 3: Statistical (cont) • Assign a weight to each sentence according to a weighted linear combination of these factors • Learn weights to optimise sentence selection as measured against a corpus of extracts + texts • Select top ranked sentences up to desired summary length AKT Workshop