Natural Language Inference

Natural Language Inference Bill MacCartney NLP Group Stanford University 8 May 2009

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Natural language inference (NLI) • Aka recognizing textual entailment (RTE) • Does premise P justify an inference to hypothesis H? • An informal, intuitive notion of inference: not strict logic • Emphasis on variability of linguistic expression P Several airlines polled saw costs grow more than expected,even after adjusting for inflation. H Some of the companies in the poll reported cost increases. yes • Necessary to goal of natural language understanding (NLU) • Many more immediate applications …

…double Georgia’s gas bill… Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion …two-fold increase in gas price… Economist.com …price of gas will be doubled… Applications of NLI semantic search question answering [King et al. 07] [Harabagiu & Hickl 06] Q: How much did Georgia’s gas price increase? A: In 2006, Gazprom doubled Georgia’s gas bill. A: Georgia’s main imports are natural gas, machinery, ... A: Tbilisi is the capital and largest city of Georgia. A: Natural gas is a gas consisting primarily of methane. Georgia’s gas bill doubled Search summarization MT evaluation [Pado et al. 09] [Tatar et al. 08] input: Gazprom va doubler le prix du gaz pour la Géorgie. machine translation X output: Gazprom will double the price of gas for Georgia. X evaluation: does output paraphrase target? target: Gazprom will double Georgia’s gas Bill.

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion NLI problem sets • RTE (Recognizing Textual Entailment) • 4 years, each with dev & test sets, each 800 NLI problems • Longish premises taken from (e.g.) newswire; short hypotheses • Balanced 2-way classification: entailment vs. non-entailment • FraCaS test suite • 346 NLI problems, constructed by semanticists in mid-90s • 55% have single premise; remainder have 2 or more premises • 3-way classification: entailment, contradiction, compatibility

robust,but shallow deep,but brittle lexical/semanticoverlap Jijkoun & de Rijke 2005 FOL &theoremproving Bos & Markert 2006 patternedrelationextraction Romano et al. 2006 semantic graph matching MacCartney et al. 2006 Hickl et al. 2006 Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion naturallogic (this work) NLI: a spectrum of approaches Solution? Problem:hard to translate NL to FOL idioms, anaphora, ellipsis, intensionality, tense, aspect, vagueness, modals, indexicals, reciprocals, propositional attitudes, scope ambiguities, anaphoric adjectives, non-intersective adjectives, temporal & causal relations, unselective quantifiers, adverbs of quantification, donkey sentences, generic determiners, comparatives, phrasal verbs, … Problem:imprecise  easily confounded by negation, quantifiers, conditionals, factive & implicative verbs, etc.

No 0.9 0.6 0.9 0.4 0.9 0.8 0.9 None Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Shallow approaches to NLI • Example: the bag-of-words approach [Glickman et al. 2005] • Measures approximate lexical similarity of H to (part of) P P Several airlines polled saw costs grow H Some of the companies in the poll reported cost increases . more than expected, even after adjusting for inflation. • Robust, and surprisingly effective for many NLI problems • But imprecise, and hence easily confounded • Ignores predicate-argument structure — this can be remedied • Struggles with antonymy, negation, verb-frame alternation • Crucially, depends on assumption of upward monotonicity • Non-upward-monotone constructions are rife! [Danescu et al. 2009]not, all, most, few, rarely, if, tallest, without, doubt, avoid, regardless, unable, …

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The formal approach to NLI Relies on full semantic interpretation of P & H • Translate to formal representation & apply automated reasoner • Can succeed in restricted domains, but not in open-domain NLI! P Several airlines polled saw costs grow more than expected,even after adjusting for inflation. (exists p (and (poll-event p) (several x (and (airline x) (obj p x) (exists c (and (cost c) (has x c) (exists g (and (grow-event g) (subj g c) (greater-than (magnitude g) ..... ? • Need background axioms to complete proofs — but from where? • Besides, NLI task based on informal definition of inferability • Bos & Markert 06 found FOL proof for just 4% of RTE problems

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Solution? Natural logic! ( natural deduction) • Characterizes valid patterns of inference via surface forms • precise, yet sidesteps difficulties of translating to FOL • A long history • traditional logic: Aristotle’s syllogisms, scholastics, Leibniz, … • modern natural logic begins with Lakoff (1970) • van Benthem & Sánchez Valencia (1986-91): monotonicity calculus • Nairn et al. (2006): an account of implicatives & factives • We introduce a new theory of natural logic… • extends monotonicity calculus to account for negation & exclusion • incorporates elements of Nairn et al.’s model of implicatives • …and implement & evaluate a computational model of it

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusions [Not covered today: the bag-of-words model, the Stanford RTE system]

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment for NLI • Most approaches to NLI depends on a facility for alignment P Gazprom today confirmed a two-fold increase in its gas price for Georgia, beginning next Monday. H Gazprom will double Georgia’s gas bill. yes • Linking corresponding words & phrases in two sentences • Alignment problem is familiar in machine translation (MT)

unaligned content: “deletions” from P approximate match: price ~ bill phrase alignment: two-fold increase ~ double Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment example H (hypothesis) P (premise)

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Approaches to NLI alignment • Alignment addressed variously by current NLI systems • In some approaches to NLI, alignments are implicit: • NLI via lexical overlap [Glickman et al. 05, Jijkoun & de Rijke 05] • NLI as proof search [Tatu & Moldovan 07, Bar-Haim et al. 07] • Other NLI systems make alignment step explicit: • Align first, then determine inferential validity [Marsi & Kramer 05, MacCartney et al. 06] • What about using an MT aligner? • Alignment is familiar in MT, with extensive literature[Brown et al. 93, Vogel et al. 96, Och & Ney 03, Marcu & Wong 02, DeNero et al. 06, Birch et al. 06, DeNero & Klein 08] • Can tools & techniques of MT alignment transfer to NLI? • Dissertation argues: not very well

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The MANLI aligner A model of alignment for NLI consisting of four components: Phrase-based representation Feature-based scoring function Decoding using simulated annealing Perceptron learning

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Phrase-based alignment representation Represent alignments by sequence of phrase edits: EQ, SUB, DEL, INS EQ(Gazprom1, Gazprom1) INS(will2) DEL(today2) DEL(confirmed3) DEL(a4) SUB(two-fold5increase6, double3) DEL(in7) DEL(its8) … • One-to-one at phrase level (but many-to-many at token level) • Avoids arbitrary alignment choices; can use phrase-based resources

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A feature-based scoring function • Score edits as linear combination of features, then sum: • Edit type features: EQ, SUB, DEL, INS • Phrase features: phrase sizes, non-constituents • Lexical similarity feature: max over similarity scores • WordNet: synonymy, hyponymy, antonymy, Jiang-Conrath • Distributional similarity à la Dekang Lin • Various measures of string/lemma similarity • Contextual features: distortion, matching neighbors

Start … Generate successors Score Smooth/sharpen P(A) = P(A)1/T Sample Lower temp Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion T = 0.9 T Repeat Decoding using simulated annealing … 100 times

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Perceptron learning of feature weights We use a variant of averaged perceptron [Collins 2002] Initialize weight vector w = 0, learning rate R0 = 1 For training epoch i = 1 to 50: For each problem Pj, Hj with gold alignment Ej: Set Êj = ALIGN(Pj, Hj, w) Set w = w + Ri ((Ej) – (Êj)) Set w = w / ‖w‖2 (L2 normalization) Set w[i] = w (store weight vector for this epoch) Set Ri = 0.8 Ri–1 (reduce learning rate) Throw away weight vectors from first 20% of epochs Return average weight vector Training runs require about 20 hours (on 800 RTE problems)

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The MSR RTE2 alignment data • Previously, little supervised data • Now, MSR gold alignments for RTE2 • [Brockett 2007] • dev & test sets, 800 problems each • Token-based, but many-to-many • allows implicit alignment of phrases • 3 independent annotators • 3 of 3 agreed on 70% of proposed links • 2 of 3 agreed on 99.7% of proposed links • merged using majority rule

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Evaluation on MSR data • We evaluate several alignment models on MSR data • Baseline: a simple bag-of-words aligner • Matches each token in H to most string-similar token in P • Two well-known MT aligners: GIZA++ & Cross-EM • Supplemented with lexicon; tried various symmetrization heuristics • A representative NLI aligner: the Stanford RTE aligner • Can’t do phrase alignments, but can exploit syntactic features • The MANLI aligner just presented • How well do they recover gold-standard alignments? • Assess per-link precision, recall, and F1; and exact match rate

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Aligner evaluation results • Bag-of-words aligner: good recall, but poor precision • MT aligners fail to learn word-word correspondences • Stanford RTE aligner struggles with function words • MANLI outperforms all others on every measure • F1: 10.5% higher than GIZA++, 6.2% higher than Stanford • Good balance of precision & recall; matched >20% exactly

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion MANLI results: discussion • Three factors contribute to success: • Lexical resources: jail ~ prison, prevent ~ stop , injured ~ wounded • Contextual features enable matching function words • Phrases: death penalty ~ capital punishment, abdicate ~ give up • But phrases help less than expected! • If we set max phrase size = 1, we lose just 0.2% in F1 • Recall errors: room to improve • 40%: need better lexical resources: conservation ~ protecting, organization ~ agencies, bone fragility ~ osteoporosis • Precision errors harder to reduce • equal function words (49%), forms of be (21%), punctuation (7%)

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment for NLI: conclusions • MT aligners not directly applicable to NLI • They rely on unsupervised learning from massive amounts of bitext • They assume semantic equivalence of P & H • MANLI succeeds by: • Exploiting (manually & automatically constructed) lexical resources • Accommodating frequent unaligned phrases • Using contextual features to align function words • Phrase-based representation shows potential • But not yet proven: need better phrase-based lexical resources

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusion

Yesentailment Nonon-entailment 2-wayRTE1,2,3 Yesentailment Unknowncompatibility Nocontradiction 3-wayFraCaS, PARC, RTE4 Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion P = Qequivalence P < Qforwardentailment P > Qreverseentailment P # Qnon-entailment containmentSánchez-Valencia Entailment relations in past work

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion 16 elementary set relations Assign sets x, y to one of 16 relations, depending on emptiness or non-emptiness of each of four partitions y y x x empty non-empty

x ^ y x‿y x⊐y xy Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion x⊏y x | y x # y 16 elementary set relations But 9 of 16 are degenerate: either x or y is either empty or universal. I.e., they correspond to semantically vacuous expressions, which are rare outside logic textbooks. We therefore focus on the remaining seven relations.

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The set of basic entailment relations Relations are defined for all semantic types: tiny⊏small, hover⊏fly, kick⊏strike,this morning⊏today, in Beijing⊏in China, everyone⊏someone, all⊏most⊏some

? R S ? fish human nonhuman Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Joining entailment relations y x y z ⊏ | ^

What is | | ? ⋈ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion | |  {, ⊏, ⊐, |, #} ⋈ Some joins yield unions of relations!

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The complete join table Of 49 join pairs, 32 yield relations in ; 17 yield unions Larger unions convey less information — limits power of inference In practice, any union which contains # can be approximated by # — so, in practice, we can avoid the complexity of unions

atomic edit: DEL, INS, SUB compound expression entailment relation Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations x e(x) •  will depend on: • the lexical entailment relation generated by e: (e) • other properties of the context x in which e is applied (, ) • Example: suppose x is red car • If e is SUB(car, convertible), then (e) is ⊐ • If e is DEL(red), then (e) is ⊏ • Crucially, (e) depends solely on lexical items in e, independent of context x • But how are lexical entailment relations determined?

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations: SUBs (SUB(x, y)) = (x, y) For open-class terms, use lexical resource (e.g. WordNet)  for synonyms: sofa couch, forbid  prohibit ⊏ for hypo-/hypernyms: crow⊏bird, frigid⊏cold, soar⊏rise | for antonyms and coordinate terms: hot| cold, cat | dog  or | for proper nouns:USA United States, JFK | FDR # for most other pairs:hungry# hippo Closed-class terms may require special handling Quantifiers: all⊏some, some^no, no | all, at least 4‿at most 6 See dissertation for discussion of pronouns, prepositions, …

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations: DEL & INS Generic (default) case: (DEL(•)) = ⊏, (INS(•)) = ⊐ • Examples: red car⊏car, sing⊐sing off-key • Even quite long phrases: car parked outside since last week⊏car • Applies to intersective modifiers, conjuncts, independent clauses, … • This heuristic underlies most approaches to RTE! • Does P subsume H? Deletions OK; insertions penalized. Special cases • Negation: didn’t sleep ^ did sleep • Implicatives & factives (e.g. refuse to, admit that): discussed later • Non-intersective adjectives: former spy | spy, alleged spy # spy • Auxiliaries etc.: is sleepingsleeps, did sleepslept

How is (x, y) projected by f?   @ means fn application @ @ x y f f  ? Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The impact of semantic composition How are entailment relations affected by semantic composition? The monotonicity calculus provides a partial answer If f has monotonicity… But how are other relations (|, ^, ‿) projected?

Each projectivity signature is a map ↦ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A typology of projectivity Projectivity signatures: a generalization of monotonicity classes In principle, 77 possible signatures, but few actually realized

↦ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A typology of projectivity Projectivity signatures: a generalization of monotonicity classes Each projectivity signature is a map In principle, 77 possible signatures, but few actually realized See dissertation for projectivity of connectives, quantifiers, verbs

@ @ ⊐ ⊐ ⊐ @ @ ⊏ ⊏ @ @ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion @ @ nobody nobody can can without without clothes a shirt enter enter Projecting through multiple levels Propagate entailment relation between atoms upward, according to projectivity class of each node on path to root nobody can enter with a shirt⊏nobody can enter with clothes

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Implicatives & factives [Nairn et al. 06] 9 signatures, per implications (+, –, or o) in positive and negative contexts

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Implicatives & factives We can specify relation generated by DEL or INS of each signature Room for variation w.r.t. infinitives, complementizers, passivation, etc. Some more intuitive when negated: he didn’t hesitate to ask | he didn’t ask Doesn’t cover factives, which involve presuppositions — see dissertation

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Putting it all together • Find a sequence of edits e1, …, en which transforms p into h. Define x0 = p, xn = h, and xi = ei(xi–1) for i [1, n]. • For each atomic edit ei: • Determine the lexical entailment relation (ei). • Project (ei) upward through the semantic composition tree of expression xi–1 to find the atomic entailment relation (xi–1, xi) • Join atomic entailment relations across the sequence of edits:(p, h) = (x0, xn) = (x0, x1) ⋈ … ⋈ (xi–1, xi) ⋈ … ⋈ (xn–1, xn) Limitations: need to find appropriate edit sequence connecting p and h;tendency of ⋈ operation toward less-informative entailment relations; lack of general mechanism for combining multiple premises Less deductive power than FOL. Can’t handle e.g. de Morgan’s Laws.

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion An example P The doctor didn’t hesitate to recommend Prozac. H The doctor recommended medication. yes ‿ | | ^ ^ ⊏ ⊏ ⊏ ⊏ yes

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Different edit orders? Intermediate steps may vary; final result is typically (though not necessarily) the same

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The NatLog system NLI problem next slide linguistic analysis 1 from outside sources alignment 2 core of systemcovered shortly lexical entailment classification 3 entailment projection 4 straightforwardnot covered further straightforwardnot covered further entailment joining 5 prediction

refuse without JimmyDean move blue jeans Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion + + + – – – + + Stage 1: Linguistic analysis • Tokenize & parse input sentences (future: & NER & coref & …) • Identify items w/ special projectivity & determine scope • Problem: PTB-style parse tree  semantic structure! S category: –/o implicatives examples: refuse, forbid, prohibit, … scope: S complement pattern: __ > (/VB.*/ > VP $. S=arg) projectivity: {:, ⊏:⊐, ⊐:⊏, ^:|, |:#, _:#, #:#} VP S VP VP PP NP NP NNP NNP VBD TO VB IN JJ NNS Jimmy Dean refused to move without blue jeans • Solution: specify scope in PTB trees using Tregex [Levy & Andrew 06]

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Stage 3: Lexical entailment classification • Goal: predict entailment relation for each edit, based solely on lexical features, independent of context • Approach: use lexical resources & machine learning • Feature representation: • WordNet features: synonymy (), hyponymy (⊏/⊐), antonymy (|) • Other relatedness features: Jiang-Conrath (WN-based), NomBank • Fallback: string similarity (based on Levenshtein edit distance) • Also lexical category, quantifier category, implication signature • Decision tree classifier • Trained on 2,449 hand-annotated lexical entailment problems • E.g., SUB(gun, weapon): ⊏, SUB(big, small): |, DEL(often): ⊏

Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The FraCaS test suite • FraCaS: a project in computational semantics [Cooper et al. 96] • 346 “textbook” examples of NLI problems • 3 possible answers: yes, no, unknown (not balanced!) • 55% single-premise, 45% multi-premise (excluded)

27% error reduction Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Results on FraCaS

27% error reduction in largest category, all but one correct high accuracy in sections most amenable to natural logic Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion high precision even outsideareas of expertise Results on FraCaS

Natural Language Inference

Natural Language Inference

Presentation Transcript

Modeling Semantic Containment and Exclusion in Natural Language Inference

Natural Language

Two Aspects of the Problem of Natural Language Inference

Learning and Inference for Natural Language Understanding

Natural Logic and Natural Language Inference

Constrained Conditional Models Learning and Inference for Natural Language Understanding

Natural Language

Natural Language

Learning and Inference for Natural Language Understanding

Constrained Conditional Models Learning and Inference in Natural Language Understanding

Representation and Inference for Natural Language

Global Inference in Learning for Natural Language Processing

Natural Language

Natural Language Processing Statistical Inference: n-grams

Natural Language Processing for Automated Inference

Natural Language Inference

A Phrase-Based Model of Alignment for Natural Language Inference

Global Inference and Learning Towards Natural Language Understanding

Natural Logic and Natural Language Inference

Constrained Conditional Models Learning and Inference in Natural Language Understanding

Two Aspects of the Problem of Natural Language Inference

Constrained Conditional Models Learning and Inference for Natural Language Understanding