1 / 23

Acquiring entailment pairs across languages and domains: A data analysis

Acquiring entailment pairs across languages and domains: A data analysis. Manaal Faruqui Dept. of Computer Science & Engineering IIT Kharagpur. Sebastian Padó Institut für Computerlinguistik Universität Heidelberg. Textual Entailment.

abedi
Download Presentation

Acquiring entailment pairs across languages and domains: A data analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acquiring entailment pairs across languages and domains: A data analysis Manaal Faruqui Dept. of Computer Science & Engineering IIT Kharagpur Sebastian Padó Institut für ComputerlinguistikUniversität Heidelberg

  2. Textual Entailment • A Premise P entails a Hypothesis H if a human reading P can infer that H is most likely true (Dagan et al. 2004) • (P): I have won Rs. 5000 in a lottery today ! • (H): I made a huge profit today. • (P): Victor, the parrot kept shrieking “Water, water”. • (H): Thirsty Jaguar procures water for Bulgarian zoo. + -

  3. Recognizing Textual Entailment • Variety of approaches • Alignment/matching (Monz & de Rijke 2001, MacCartney et al. 2006) • Transformations (Bar-Haim et al. 2007, Harmeling 2009) • Logics-based (MacCartney & Manning 2008, Bos & Markert 2005) • Many systems have a supervised learning component • Optimize model parameters • Training requires positive/negative entailment pairs • Little available: RTE Challenges create ~1000 pairs per year • Creating manually tagged training data is expensive • Wanted: Automatic extraction of entailment pairs

  4. A heuristic for extracting entailment pairs • The most prominent idea: Take advantage of document structure (Burger and Ferro 2005) • In newswire articles, title is often abbreviated version of first sentence: First sentence entails title. Sainsbury's reports record Christmas sales J Sainsbury has posted its best ever Christmas sales, with strong demand for homeware and electrical goods driving up trading over the festive period. …(Guardian, Jan 12 2011)

  5. Previous work – Our questions • Burger and Ferro (2005) • 50% of title-first sentence pairs show entailment • SVM identifies documents with 77% accuracy • Hickl et al. (2006) • remove pairs that “do not share an entity (or NP)”: 92% acc. • Not a lot of detail available • Our questions: • Does this work across languages? • Does this work across sources (genres)?

  6. Our Agenda • Extract headline-first sentence pairs from newswire • Experiment 1: Different languages (English, German, Hindi) • Experiment 2: Different sources (German newspapers) • Filtering of entailment pairs (motivated by Hickl et al.) • Remove sentences that do not share a noun, questions • Manual annotation of entailment pair candidates • Identify phenomena that break entailment • Classification wrt entailment by logistic regression model • Analyse usefulness of predictors

  7. Step 3: Manual annotation (I) • A fine-grained annotation scheme (8 classes) • Main improvement: Subdivision of “No” class into five subclasses for entailment-breaking phenomena • “No-par(tial)”: When P “almost” entails H, but P misses one crucial bit of information (P): Gaza will soon get its first American fast food outlet (H): KFC to open restaurant in Gaza • “No-pre”: Comprehension of P presupposes H (P): In this manner, he hopes to increase industrial growth. (H): Bush ordered tax rates on import to be reduced.

  8. A fine-grained annotation scheme • “No-con”: Direct contradiction between P and H. (P): How the biological clock works is still unknown. (H): Light regulates the biological clock. • “No-emb(ed)”: Some type of embedding (e.g. a modal verb) breaks the entailment (P): A gambling amendment is expected to be submitted to the state Supreme Court (H):Gaming petition goes before court • “No-oth(er)”: All cases without a more specific category (P): Victor, the parrot kept shrieking “Water, water”. (H): Thirsty Jaguar procures water for Bulgarian zoo.

  9. Ill-formed sentence pairs • “Err”: Due to errors in sentence boundary detection • “Ill”: Some titles are not single grammatical sentences and can not be interpreted sensibly as a hypothesis (H): Research Alert: Mexico Upped, Chile cut.

  10. Logistic regression modeling • Logistic regression models predict a binary response variable y based on a set of predictors x: • Train on annotated data (lump all classes into “yes”/“no”) • Analysis step 1: Compute coefficients β of predictors • Significance • eβ can be seen as odds: change in p(y=1) when x changes • Analysis step 2: test how well predictors generalize • Apply models trained on corpus 1 to predict entailment for corpus 2

  11. Predictors • Four (hopefully language-independent) predictors • Weighted word-overlap: a tf-idf (informativity-based) weighting scheme to compute the word overlap between P and H • Hypothesis: high word overlap  higher chance of entailment

  12. Predictors • Strict noun match: Precision-focused boolean predictor: true if all H nouns are present in P • Hypothesis: strict noun match  higher chance of entailment • Log num words: (logarithmized) length of article • Hypothesis: longer article  lower chance of entailment • Punctuation: presence of colon, full stop, hypen in title -- indicator of titles that cannot be interpreted as hypotheses • Hypothesis: punctuation  lower chance of entailment

  13. Exp 1, Analysis by Language: Annotation • English & German: Reuters RCV2 (politics/economy) • Hindi: EMILLE Corpus (politics) • Reasonable number of entailing pairs (“yes”) • More than 50% (Burger/Ferro) but less than 92% (Hickl) • German headlines often not simple sentences (“ill”) • Many Hindi “other” cases: 1st sentence less “to the point” • Embeddings, presuppositions, contradictions very rare

  14. Exp 1, Analysis by Language: Predictors • Odds of predictors trained on different corpora: • Highly significant for all three languages: Word overlap and punctuation • Hypotheses validated by the data • Generally insignificant: Article length (too noisy) and noun match (too strict)

  15. Exp 1, Analysis by Language: Accuracy • Application of L1 models to predict L2 data • Goal is a clean dataset (high precision) • Evaluation : Set recall of “yes” class to 30%, compare precision • Precision for all models over 90% • Only minor losses when applying models across languages • Overall: Languages and predictors behave similarly

  16. Exp 2, Analysis by Source: Annotation • German newspapers: Reuters, StuttgarterZeitung, Die Zeit • StuttgarterZeitung (StuttZ): newswiry, but less consistent “house style”, more coverage of regional and local events • Die Zeit: “high-brow” weekly (culture, science, sociopolitics) • StuttZ, Die Zeit: less entailment pairs (“yes”) • Die Zeit: many ill-formed and unrelated (“no-oth”) pairs • “Intellectual style”

  17. Exp 2, Analysis by Source: Predictors • Odds of predictors trained on different corpora: • Fairly similar picture to Exp 1 • word overlap, punctuation highly significant • log num words, noun match not significant

  18. Exp 2, Analysis by Source: Accuracy • Results much worse than in Exp 1 • Precision > 90% only for Reuters; StuttZ 84%, Die Zeit < 50%! • Generally larger losses for application across sources (5%) • Reflects “more difficult” distribution of Die Zeit (only?) • Generalization across sources seems to be more difficult than across languages

  19. Lexical analysis of the sources • How domain-specific are the three sources? • KL divergence between their unigram distribution and a domain-unspecific reference corpus (Ciaramita and Baroni 2006) • Higher KL divergence = more specific • Stand-in for the reference corpus: deWac(Baroni et al. 2009) • Reuters most specific, Die Zeit least specific

  20. Summary: The “newswire heuristic” • A prominent heuristic to obtain entailment pairs: Combine title of newspaper article with first sentence • We applied it to three languages and three sources • Annotation + analysis by logistic regression model • Main Results: • Entailment breakers: Title ill-formed,unrelatedness • Generalization across languages works well • Generalization across sources does not work well

  21. Why does the newswire heuristic work? • Reuters articles have a consistent style • Reuters articles come from a specific domain • These two properties are shared by similar news agency outlets in other languages… • …but not necessarily by other types of newspapers!

  22. The take-home message • Unless you want to extract entailment pairs from Reuters, look for another heuristic • Extraction from Wikipedia (edits)? • Committee of RTE systems? • Generation? • At the end of the day, the question is what entailment phenomena you want to collect instances of • There is no such thing as a representative sample of entailment pairs

  23. Thank you !

More Related