290 likes | 308 Views
This paper discusses a machine reading-based approach to update summarization, focusing on the task of maximizing new information in the summary. It covers topics such as question processing, sentence retrieval and ranking, recognizing textual entailment, sentence selection, and summary generation. The results demonstrate the effectiveness of the approach.
E N D
A Machine Reading-based Approach toUpdate Summarization Andrew Hickl, Kirk Roberts, and Finley LacatusuLanguage Computer CorporationApril 26, 2007
Overview • Introduction • Why Machine Reading? • System Overview • Question Processing • Sentence Retrieval and Ranking • Recognizing Textual Entailment • Sentence Selection • Summary Generation • Results • Main Task • Update Task • Conclusions and Future Considerations
Update Summarization • As currently defined, the task of update summarization requires systems to maximize the amount of new information included in a summary that is not availablefrom any previously-considered document. Term overlap, etc. Don’t consider identical content. Consider new information. Don’t consider textually entailed content. Considercontradictory content. Potentially consider inferable content. Require access to models of the knowledge available from texts!
What is Machine Reading? • Machine Reading (MR) applications seek to promote the understanding of texts by provide a representation of the knowledge available from a corpus. • Three important components: • Knowledge Acquisition: How can we automatically extract the semantic/pragmatic content of a text? • Knowledge Representation: How do we encode the propositional content of a text in a regular manner? • Stability/Control: How do we ensure that the knowledge acquired from text is consistent with previous commitments stored in a knowledge base? • We believe that the recognition ofknowledge that’s consistent with a KB is an important prerequisite for performing update summarization: • Identify content that’s already stored in the KB • Identify content that’s inferable from the KB • Identify content that contradicts content in the KB • Consistency: Assume that knowledge is consistent wrt a particular model Miff the truth of a proposition can be reasonably inferred from the other knowledge commitments of M.
From Textual Inference to Machine Reading • The recent attention paid to the task of recognizing textual entailment (Dagan et al. 2006) and textual contradiction (Harabagiu et al. 2006) has led to the development of systems capable of accurately recognizing different types of textual inference relationships in natural language texts. Textual Entailment (RTE3 Test Set) Textual Contradiction A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess. Text: A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess. Hyp: Lane never set foot in White House. Hyp: Lane worked at the White House. • Despite still being a relatively new evaluation area for NLP, statistical knowledge-lean approaches are achieving near human-like performance: • PASCAL RTE-2 (2006): 75.38% accuracy (Hickl et al. 2006) [max: 86.5%] • PASCAL RTE-3 (2007): 81.75% accuracy (Hickl and Bensley 2007) [max: 85.75%] • Contradiction: 66.5% accuracy (Harabagiu, Hickl, and Lacatusu, 2006) • Human Agreement: 86% (entailment), 81% (contradiction)
The Machine Reading Cycle Document Ingestion TextRepository Text Probes (hypotheses) No CommitmentExtraction TextualEntailment Yes Knowledge Consolidation KnowledgeBase No KnowledgeSelection TextualContradiction Yes KB Commitments(“texts”)
Overview • Introduction • Why Machine Reading? • System Overview • Question Processing • Sentence Retrieval and Ranking • Recognizing Textual Entailment • Sentence Selection • Summary Generation • Results • Main Task • Update Task • Conclusions and Future Considerations
Architecture of GISTexter Question Processing Sentence Retrieval and Ranking Summary Generation SentenceRanking KeywordExtraction Question Answering Main ComplexQuestion Summary Answer Syntactic Q Decomp SummaryGeneration Multi-DocumentSummarization Semantic Q Decomp Update Sentence Selection CommitmentExtraction KnowledgeBase No No TextualContradiction TextualEntailment Yes Yes Corrections New Knowledge Machine Reading
Question Processing • GISTexter uses three different Question Processing modules in order to represent the information need of complex questions. What are the long term and short term implications of Israel’s continuing military action against Lebanon, including airstrikes on Hezbollah positions in Southern Lebanon? Keyword Extraction/Alternation SyntacticQuestion Decomposition SemanticQuestion Decomposition What are the long term implications of Israel’s action against Lebanon? What ramifications could another round of airstrikes have on relations? Extracted: implications, Israel, military, action, (Southern) Lebanon, airstrikes, Hezbollah, positions What are the short term implications of Israel’s action against Lebanon? What could foment anti-Israeli sentiment among the Lebanese population? Alternations: implications, effects, outcomes, disaster, scandal, crisis; Israel, Israeli, Jewish state; military action, attack, operation, onslaught, invasion; Lebanon, Lebanese; positions, locations, facilities, targets, bunkers, areas, situations What are the long term implications of Israeli airstrikes on Hezbollah positions in Southern Lebanon? What kinds of humanitarian assistance has Hezbollah provided in Lebanon? How much damage resulted from the Israeli airstrikes on Lebanon? What are the short term implications of Israeli airstrikes on Hezbollah positions in Southern Lebanon? Who has re-built roads, schools, and hospitals in Southern Lebanon?
Question Processing Q0. What are the long-term ramificationsof Israeli airstrikesagainst Hezbollah? A0: Security experts warn that this round of airstrikes could have serious ramifications for Israel , including fomenting anti-Israeli sentiment among most of the Lebanese population for generations to come. R1. ramifications-airstrikes R2. fomenting-unrest R3. provide-humanitarian assistance Q1. What ramifications could this round of airstrikes have ? Q2. What could foment anti-Israeli sentiment among the Lebanese population? Q3. What kinds of humanitarian assistance has Hezbollah provided in Lebanon? A1. The most recent round of Israeli airstrikes has caused significant damage to the Lebanese civilian infrastructure, resulting in more than an estimated $900 million in damage in the Lebanese capital of Beirut alone. A2. Hezbollah has providedhumanitarianassistanceto the people of Southern Lebanon following recent airstrikes; a surprising move to many who believe Hezbollah’ssole purpose was foment unrest in Lebanon and Israel. A3. Following the widespread destruction caused by Israeli airstrikes, Hezbollah has moved quickly to provide humanitarian assistance, including rebuilding roads, schools, and hospitals and ensuring that water and power is available in metropolitan areas. R4. result-COST R5. ORG-purpose R6. ORG-rebuild Q4. How much damage resulted from the airstrikes? Q5. What is Hezbollah’s sole purpose? Q6. Who has re-built roads, schools, and hospitals?
Sentence Retrieval and Ranking • As with our DUC 2006 system, we used two different mechanisms to retrieve relevant sentences for a summary: • Question Answering (Hickl et al. 2006): • Keywords extracted from subquestions and automatically expanded • Sentences retrieved and ranked based on number and proximity of keywords in each sentence • Top 10 answers from each subquestion are combined and re-ranked in order to produce a ranked list of sentences for a summary • Multi-Document Summarization (Harabagiu et al. 2006, Harabagiu & Lacatusu 2005): • Computed topic signatures (Lin and Hovy 2000) and enhanced topic signatures (Harabagiu 2004) for each relevant set of documents • Sentences retrieved based on keywords; re-ranked based on combined topic score derived from topic signatures • All retrieved sentences were then re-ranked based on a number of features, including: • Relevance score assigned by retrieval engine • Position in document • Number of topical terms / named entities • Length of original document • Feature weights were determined using a hill-climber trained on “human” summaries from the DUC 2005 and 2006 main tasks.
Architecture of GISTexter Question Processing Sentence Retrieval and Ranking Summary Generation SentenceRanking KeywordExtraction Question Answering Main ComplexQuestion Summary Answer Syntactic Q Decomp SummaryGeneration Multi-DocumentSummarization Semantic Q Decomp Update SentenceSelection CommitmentExtraction KnowledgeBase No No TextualContradiction TextualEntailment Yes Yes Corrections New Knowledge Machine Reading
Recognizing Textual Entailment Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge
Recognizing Textual Entailment • Step 1. Preprocessing of text-hypothesis pairs • POS Tagging, Syntactic Parsing, Morphological Stemming, Collocation Detection • Annotation of Tense/Aspect, Modality, Polarity • Semantic Parsing (PropBank, NomBank, FrameNet) • Named Entity Recognition (~300 named entity types) • Temporal Normalization • Temporal Relation Detection (t-link, s-link, a-link) • Pronominal Co-reference • Nominal Co-reference • Synonymy and Antonymy Detection • Predicate Alternation (based on pre-cached corpus of predicate paraphrases) Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge
Recognizing Textual Entailment • Step 2. Commitment Extraction Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge 1. The ship was named for Harriet Lane. 2. A revenue cutter was named for Harriet Lane. 3. The ship was named for the niece of Buchanan. 4. Buchanan had the title of President. 5. Buchanan had a niece. 6. A revenue cutter was named for the niece of Buchanan. 7. Harriet Lane was the niece of Buchanan. 8. Harriet Lane was related to Buchanan. 9. Harriet Lane served as Buchanan’s White House hostess. 10. Buchanan had a White House Hostess. 11. There was a hostess at the White House. 12. The niece of Buchanan served as White House hostess. 13. Harriet Lane served as White House hostess. 14. Harriet Lane served as a hostess. 15. Harriet Lane served at the White House. A revenue cutter, the ship was named for Harriet Lane, niece of President James Buchanan, who served as Buchanan’s White House hostess. Conjunction Subordination Reported Speech Appositives Relative Clauses Titles and Epithets Co-reference Resolution Ellipsis Resolution Pre-Nominal Modifiers Possessives 16. Harriet Lane worked at the White House. Harriet Lane worked at the White House.
Recognizing Textual Entailment • Step 3. Commitment Alignment • Used Taskar et al. (2005)’s discriminative matching approach to word alignment • Cast alignment prediction as maximum weight bipartite matching • Used large-margin estimation to learn parameters w which: • where yi(correct alignment), (actual alignment), xi(sentence pair), w (parameter), f(feature mapping) • Used reciprocal best-hit match to ensure that best commitment alignments were considered Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge 13. Harriet Lane served as White House hostess. 14. Harriet Lane served as a hostess. 15. Harriet Lane served at the White House. Harriet Lane worked at the White House.
Recognizing Textual Entailment • Step 4. Lexical Alignment • Used Maximum Entropy Classifier to identify best possible token-wise alignment for each phrase chunk found in t-h pair • Morphological Stemming / Levenshtein Edit Distance • Numeric /Date Comparators (second, 2; 1920’s, 1928) • Named Entity Categories (350+ types from LCC’s CiceroLite) • WordNet synonymy/antonymy distance Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge Harriet Lane served at the White House. Harriet Lane worked at the White House.
Recognizing Textual Entailment • Step 5. Entailment Classification and Contradiction Recognition • Used Decision Tree classifier (C4.5) • 2006 (RTE-2): Trained on 100K+ Entailment Pairs • 2007 (RTE-3): Trained only on RTE-3 Development Set (800 Pairs) • If NO judgment returned: • Consider all other commitment-hypothesis pairs with palign ≥ ; ( = 0.85) • Return NO as RTE judgment • If YES judgment returned: • Used system for recognizing textual contradiction (Harabagiu et al. 2006) to determine whether the hypothesis contradicted any other extracted commitment • If no contradiction can be found positive instance of textual entailment • If contradiction negative instance of textual entailment Extracted commitments from t and h ExtractedKnowledge Text Commitments text NO NO CommitmentAlignment +TE Preprocessing CommitmentExtraction LexicalAlignment EntailmentClassification ContradictionRecognition -TE YES YES Hyp Commitments hyp EntailedKnowledge
Architecture of GISTexter Question Processing Sentence Retrieval and Ranking Summary Generation SentenceRanking KeywordExtraction Question Answering Main ComplexQuestion Summary Answer Syntactic Q Decomp SummaryGeneration Multi-DocumentSummarization Semantic Q Decomp Update SentenceSelection CommitmentExtraction KnowledgeBase No No TextualContradiction TextualEntailment Yes Yes Corrections New Knowledge Machine Reading
Sentence Selection and KB Update • Step 1. Entailment confidence scores assigned to commitments are then used to re-rank the sentences that they were extracted from: • Textual Entailment: • Entailed Commitments (known information): Negative Weight • Non-Entailed Commitments (new information): Positive Weight • Textual Contradiction: • Contradicted Commitments (changed information): Positive Weight • Non-Contradicted Commitment (no change): No Contribution • Confidence scores are normalized for textual entailment and textual contradiction • Step 2. After each round of summarization, GISTexter’s knowledge base is updated to include: • All non-entailed commitments • All contradicted commitments • Step 3. Fixed-length summaries were generated as in (Lacatusu et al. 2006): • Top-ranked sentences clustered based on topic signatures to promote coherence • Heuristics used to insert paragraph breaks, drop words until word limit was met.
Results: Main Task • Two differences between 2006 and 2007 versions of GISTexter: • Sentence Ranking: • 2006: Used textual entailment to create Pyramids from 6 candidate summaries • 2007: Learned sentence weights based on 2005, 2006 summaries • Coreference Resolution: • 2006: Used heuristics to select sentences with “resolvable” pronouns • 2007: Used coreference resolution system to resolve all pronouns
Non-Redundancy vs. Referential Clarity • Using output from a pronoun resolution system can boost referential clarity: but at what price? • Only a modest gain: 3.71 4.09 • Marked loss in non-redundancy: 4.60 3.89 • Same redundancy filtering techniques used in 2006, 2007 • Summaries appear to be incurring a “repeat mention penalty”; need to know: • When pronouns should be resolved • When pronouns should not be resolved Need to revisit our heuristics! Resolved Output Original Context
Results: Update Task • Evaluation results from the Update Task were encouraging: GISTexter produced some of the most responsive summaries evaluated in DUC 2007.
Results: Update Task • On average, “B” summaries were judged to be significantly worse than either “A” or “C” summaries on both Content Responsiveness and Modified Pyramid. • Unclear as to exactly why this was the case • Not due to “over-filtering”: KBA was always smaller than KBA+B, less knowledge to potentially entail commitments extracted from the text.
Future Considerations • What’s the right way to deal with contradictory information? • Do users want to be notified when information changes? • When any information changes? • When relevant information changes? • How do you incorporate updates into a coherent text? • How can we evaluate the quality of updates? • Current approaches only measure the responsiveness of individual summaries • Is it possible to create “gold standard” lists of the facts (propositions?) that are available from a reading of a text? • Isn’t it enough just to be responsive? • For Q/A or QDS – yes. For database update tasks – maybe not. • How much recall do readers have? • Is it fair to assume that a reader of a text has access to all of the knowledge stored in a knowledge repository? • What level of “recap” is needed?
Ensuring Stability and Control • In order to take full advantage of the promise of machine reading for summarization, systems need to take steps to provide greater stability and control over the knowledge being added to a KB. • Control: How do we keep from introducing error-full knowledge into our knowledge bases? • Stability: How do we keep from removing accurate knowledge from our knowledge bases? + Including Perfect Knowledge MachineReading! 1 Quality of Knowledge in KB IntroducingErrors! Introducing Errors, Removing Accurate Knowledge 0 time
Semantic Question Decomposition • Method for decomposing questions operates on a Markov Chain (MC) by performing a random walk on a bipartite graph of: • Sequences of operators on relations (Addition(R1), Remove(R1), Replace(R1,R2)) • Previous questions created by previous sequence of operators • Markov Chain alternates between selecting a sequence of operations ({Oi}) and generating a question decomposition (Qi): O0 O1 O2 p(O1|Q1) p(O2|Q2) p(Q1|O0) p(Q2|O1) Q1 Q2 • Assume initial state of MC depends initial sequence of operators available ({O0}) • Defining {O0} depends on access to a knowledge mapping function M1(KB, T, TC): • KB: available knowledge base • T: available text in corpus • TC: concepts extracted from T • Assume that {O0} represents set of operators that maximizes value of M1.
Semantic Question Decomposition • Following (Lapata and Lascarides 2003), the role of M1is to coerce knowledge from a conceptual representation of a text that can be used in question decomposition. • State transition probabilities also depend on a second mapping function, M2,defined as: M2(KB, T) = {CL, RL} • CL: set of related concepts stored in a KB • RL: set of relations that exist between concepts in CL • Both CLand RLare assumed to be discovered using M1 • This notation allows us to define a random walk for hypothesis generation using a matrix notation: • Given N = |CL| and M = |RL|, we define: • A stochastic matrix A (with dimensions N × M ) with entries ai,j = p(ri|hj), where ri = sequence of relations and hj = partial hypothesis generated • … and a second matrix, B (with dimensions M × N ) with entries bi,j = p(hi|rj) • We can estimate the probabilities for ai,j and bi,j by applying the Viterbi algorithm to the maximum likelihood estimations resulting from the knowledge mappings for M1 and M2. • Several possibilities for M1 and M2, including: • Density functions introduced by (Resnik 1995) • Probabilistic framework for taxonomy representation (Snow et al. 2006)