670 likes | 909 Views
Natural Language Inference. Bill MacCartney NLP Group Stanford University 8 May 2009. Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion. Natural language inference (NLI). Aka recognizing textual entailment (RTE)
E N D
Natural Language Inference Bill MacCartney NLP Group Stanford University 8 May 2009
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Natural language inference (NLI) • Aka recognizing textual entailment (RTE) • Does premise P justify an inference to hypothesis H? • An informal, intuitive notion of inference: not strict logic • Emphasis on variability of linguistic expression P Several airlines polled saw costs grow more than expected,even after adjusting for inflation. H Some of the companies in the poll reported cost increases. yes • Necessary to goal of natural language understanding (NLU) • Many more immediate applications …
…double Georgia’s gas bill… Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion …two-fold increase in gas price… Economist.com …price of gas will be doubled… Applications of NLI semantic search question answering [King et al. 07] [Harabagiu & Hickl 06] Q: How much did Georgia’s gas price increase? A: In 2006, Gazprom doubled Georgia’s gas bill. A: Georgia’s main imports are natural gas, machinery, ... A: Tbilisi is the capital and largest city of Georgia. A: Natural gas is a gas consisting primarily of methane. Georgia’s gas bill doubled Search summarization MT evaluation [Pado et al. 09] [Tatar et al. 08] input: Gazprom va doubler le prix du gaz pour la Géorgie. machine translation X output: Gazprom will double the price of gas for Georgia. X evaluation: does output paraphrase target? target: Gazprom will double Georgia’s gas Bill.
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion NLI problem sets • RTE (Recognizing Textual Entailment) • 4 years, each with dev & test sets, each 800 NLI problems • Longish premises taken from (e.g.) newswire; short hypotheses • Balanced 2-way classification: entailment vs. non-entailment • FraCaS test suite • 346 NLI problems, constructed by semanticists in mid-90s • 55% have single premise; remainder have 2 or more premises • 3-way classification: entailment, contradiction, compatibility
robust,but shallow deep,but brittle lexical/semanticoverlap Jijkoun & de Rijke 2005 FOL &theoremproving Bos & Markert 2006 patternedrelationextraction Romano et al. 2006 semantic graph matching MacCartney et al. 2006 Hickl et al. 2006 Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion naturallogic (this work) NLI: a spectrum of approaches Solution? Problem:hard to translate NL to FOL idioms, anaphora, ellipsis, intensionality, tense, aspect, vagueness, modals, indexicals, reciprocals, propositional attitudes, scope ambiguities, anaphoric adjectives, non-intersective adjectives, temporal & causal relations, unselective quantifiers, adverbs of quantification, donkey sentences, generic determiners, comparatives, phrasal verbs, … Problem:imprecise easily confounded by negation, quantifiers, conditionals, factive & implicative verbs, etc.
No 0.9 0.6 0.9 0.4 0.9 0.8 0.9 None Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Shallow approaches to NLI • Example: the bag-of-words approach [Glickman et al. 2005] • Measures approximate lexical similarity of H to (part of) P P Several airlines polled saw costs grow H Some of the companies in the poll reported cost increases . more than expected, even after adjusting for inflation. • Robust, and surprisingly effective for many NLI problems • But imprecise, and hence easily confounded • Ignores predicate-argument structure — this can be remedied • Struggles with antonymy, negation, verb-frame alternation • Crucially, depends on assumption of upward monotonicity • Non-upward-monotone constructions are rife! [Danescu et al. 2009]not, all, most, few, rarely, if, tallest, without, doubt, avoid, regardless, unable, …
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The formal approach to NLI Relies on full semantic interpretation of P & H • Translate to formal representation & apply automated reasoner • Can succeed in restricted domains, but not in open-domain NLI! P Several airlines polled saw costs grow more than expected,even after adjusting for inflation. (exists p (and (poll-event p) (several x (and (airline x) (obj p x) (exists c (and (cost c) (has x c) (exists g (and (grow-event g) (subj g c) (greater-than (magnitude g) ..... ? • Need background axioms to complete proofs — but from where? • Besides, NLI task based on informal definition of inferability • Bos & Markert 06 found FOL proof for just 4% of RTE problems
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Solution? Natural logic! ( natural deduction) • Characterizes valid patterns of inference via surface forms • precise, yet sidesteps difficulties of translating to FOL • A long history • traditional logic: Aristotle’s syllogisms, scholastics, Leibniz, … • modern natural logic begins with Lakoff (1970) • van Benthem & Sánchez Valencia (1986-91): monotonicity calculus • Nairn et al. (2006): an account of implicatives & factives • We introduce a new theory of natural logic… • extends monotonicity calculus to account for negation & exclusion • incorporates elements of Nairn et al.’s model of implicatives • …and implement & evaluate a computational model of it
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusions [Not covered today: the bag-of-words model, the Stanford RTE system]
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment for NLI • Most approaches to NLI depends on a facility for alignment P Gazprom today confirmed a two-fold increase in its gas price for Georgia, beginning next Monday. H Gazprom will double Georgia’s gas bill. yes • Linking corresponding words & phrases in two sentences • Alignment problem is familiar in machine translation (MT)
unaligned content: “deletions” from P approximate match: price ~ bill phrase alignment: two-fold increase ~ double Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment example H (hypothesis) P (premise)
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Approaches to NLI alignment • Alignment addressed variously by current NLI systems • In some approaches to NLI, alignments are implicit: • NLI via lexical overlap [Glickman et al. 05, Jijkoun & de Rijke 05] • NLI as proof search [Tatu & Moldovan 07, Bar-Haim et al. 07] • Other NLI systems make alignment step explicit: • Align first, then determine inferential validity [Marsi & Kramer 05, MacCartney et al. 06] • What about using an MT aligner? • Alignment is familiar in MT, with extensive literature[Brown et al. 93, Vogel et al. 96, Och & Ney 03, Marcu & Wong 02, DeNero et al. 06, Birch et al. 06, DeNero & Klein 08] • Can tools & techniques of MT alignment transfer to NLI? • Dissertation argues: not very well
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The MANLI aligner A model of alignment for NLI consisting of four components: Phrase-based representation Feature-based scoring function Decoding using simulated annealing Perceptron learning
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Phrase-based alignment representation Represent alignments by sequence of phrase edits: EQ, SUB, DEL, INS EQ(Gazprom1, Gazprom1) INS(will2) DEL(today2) DEL(confirmed3) DEL(a4) SUB(two-fold5increase6, double3) DEL(in7) DEL(its8) … • One-to-one at phrase level (but many-to-many at token level) • Avoids arbitrary alignment choices; can use phrase-based resources
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A feature-based scoring function • Score edits as linear combination of features, then sum: • Edit type features: EQ, SUB, DEL, INS • Phrase features: phrase sizes, non-constituents • Lexical similarity feature: max over similarity scores • WordNet: synonymy, hyponymy, antonymy, Jiang-Conrath • Distributional similarity à la Dekang Lin • Various measures of string/lemma similarity • Contextual features: distortion, matching neighbors
Start … Generate successors Score Smooth/sharpen P(A) = P(A)1/T Sample Lower temp Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion T = 0.9 T Repeat Decoding using simulated annealing … 100 times
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Perceptron learning of feature weights We use a variant of averaged perceptron [Collins 2002] Initialize weight vector w = 0, learning rate R0 = 1 For training epoch i = 1 to 50: For each problem Pj, Hj with gold alignment Ej: Set Êj = ALIGN(Pj, Hj, w) Set w = w + Ri ((Ej) – (Êj)) Set w = w / ‖w‖2 (L2 normalization) Set w[i] = w (store weight vector for this epoch) Set Ri = 0.8 Ri–1 (reduce learning rate) Throw away weight vectors from first 20% of epochs Return average weight vector Training runs require about 20 hours (on 800 RTE problems)
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The MSR RTE2 alignment data • Previously, little supervised data • Now, MSR gold alignments for RTE2 • [Brockett 2007] • dev & test sets, 800 problems each • Token-based, but many-to-many • allows implicit alignment of phrases • 3 independent annotators • 3 of 3 agreed on 70% of proposed links • 2 of 3 agreed on 99.7% of proposed links • merged using majority rule
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Evaluation on MSR data • We evaluate several alignment models on MSR data • Baseline: a simple bag-of-words aligner • Matches each token in H to most string-similar token in P • Two well-known MT aligners: GIZA++ & Cross-EM • Supplemented with lexicon; tried various symmetrization heuristics • A representative NLI aligner: the Stanford RTE aligner • Can’t do phrase alignments, but can exploit syntactic features • The MANLI aligner just presented • How well do they recover gold-standard alignments? • Assess per-link precision, recall, and F1; and exact match rate
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Aligner evaluation results • Bag-of-words aligner: good recall, but poor precision • MT aligners fail to learn word-word correspondences • Stanford RTE aligner struggles with function words • MANLI outperforms all others on every measure • F1: 10.5% higher than GIZA++, 6.2% higher than Stanford • Good balance of precision & recall; matched >20% exactly
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion MANLI results: discussion • Three factors contribute to success: • Lexical resources: jail ~ prison, prevent ~ stop , injured ~ wounded • Contextual features enable matching function words • Phrases: death penalty ~ capital punishment, abdicate ~ give up • But phrases help less than expected! • If we set max phrase size = 1, we lose just 0.2% in F1 • Recall errors: room to improve • 40%: need better lexical resources: conservation ~ protecting, organization ~ agencies, bone fragility ~ osteoporosis • Precision errors harder to reduce • equal function words (49%), forms of be (21%), punctuation (7%)
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Alignment for NLI: conclusions • MT aligners not directly applicable to NLI • They rely on unsupervised learning from massive amounts of bitext • They assume semantic equivalence of P & H • MANLI succeeds by: • Exploiting (manually & automatically constructed) lexical resources • Accommodating frequent unaligned phrases • Using contextual features to align function words • Phrase-based representation shows potential • But not yet proven: need better phrase-based lexical resources
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusion
Yesentailment Nonon-entailment 2-wayRTE1,2,3 Yesentailment Unknowncompatibility Nocontradiction 3-wayFraCaS, PARC, RTE4 Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion P = Qequivalence P < Qforwardentailment P > Qreverseentailment P # Qnon-entailment containmentSánchez-Valencia Entailment relations in past work
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion 16 elementary set relations Assign sets x, y to one of 16 relations, depending on emptiness or non-emptiness of each of four partitions y y x x empty non-empty
x ^ y x‿y x⊐y xy Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion x⊏y x | y x # y 16 elementary set relations But 9 of 16 are degenerate: either x or y is either empty or universal. I.e., they correspond to semantically vacuous expressions, which are rare outside logic textbooks. We therefore focus on the remaining seven relations.
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The set of basic entailment relations Relations are defined for all semantic types: tiny⊏small, hover⊏fly, kick⊏strike,this morning⊏today, in Beijing⊏in China, everyone⊏someone, all⊏most⊏some
? R S ? fish human nonhuman Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Joining entailment relations y x y z ⊏ | ^
What is | | ? ⋈ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion | | {, ⊏, ⊐, |, #} ⋈ Some joins yield unions of relations!
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The complete join table Of 49 join pairs, 32 yield relations in ; 17 yield unions Larger unions convey less information — limits power of inference In practice, any union which contains # can be approximated by # — so, in practice, we can avoid the complexity of unions
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusion
atomic edit: DEL, INS, SUB compound expression entailment relation Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations x e(x) • will depend on: • the lexical entailment relation generated by e: (e) • other properties of the context x in which e is applied (, ) • Example: suppose x is red car • If e is SUB(car, convertible), then (e) is ⊐ • If e is DEL(red), then (e) is ⊏ • Crucially, (e) depends solely on lexical items in e, independent of context x • But how are lexical entailment relations determined?
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations: SUBs (SUB(x, y)) = (x, y) For open-class terms, use lexical resource (e.g. WordNet) for synonyms: sofa couch, forbid prohibit ⊏ for hypo-/hypernyms: crow⊏bird, frigid⊏cold, soar⊏rise | for antonyms and coordinate terms: hot| cold, cat | dog or | for proper nouns:USA United States, JFK | FDR # for most other pairs:hungry# hippo Closed-class terms may require special handling Quantifiers: all⊏some, some^no, no | all, at least 4‿at most 6 See dissertation for discussion of pronouns, prepositions, …
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Lexical entailment relations: DEL & INS Generic (default) case: (DEL(•)) = ⊏, (INS(•)) = ⊐ • Examples: red car⊏car, sing⊐sing off-key • Even quite long phrases: car parked outside since last week⊏car • Applies to intersective modifiers, conjuncts, independent clauses, … • This heuristic underlies most approaches to RTE! • Does P subsume H? Deletions OK; insertions penalized. Special cases • Negation: didn’t sleep ^ did sleep • Implicatives & factives (e.g. refuse to, admit that): discussed later • Non-intersective adjectives: former spy | spy, alleged spy # spy • Auxiliaries etc.: is sleepingsleeps, did sleepslept
How is (x, y) projected by f? @ means fn application @ @ x y f f ? Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The impact of semantic composition How are entailment relations affected by semantic composition? The monotonicity calculus provides a partial answer If f has monotonicity… But how are other relations (|, ^, ‿) projected?
Each projectivity signature is a map ↦ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A typology of projectivity Projectivity signatures: a generalization of monotonicity classes In principle, 77 possible signatures, but few actually realized
↦ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion A typology of projectivity Projectivity signatures: a generalization of monotonicity classes Each projectivity signature is a map In principle, 77 possible signatures, but few actually realized See dissertation for projectivity of connectives, quantifiers, verbs
@ @ ⊐ ⊐ ⊐ @ @ ⊏ ⊏ @ @ Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion @ @ nobody nobody can can without without clothes a shirt enter enter Projecting through multiple levels Propagate entailment relation between atoms upward, according to projectivity class of each node on path to root nobody can enter with a shirt⊏nobody can enter with clothes
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Implicatives & factives [Nairn et al. 06] 9 signatures, per implications (+, –, or o) in positive and negative contexts
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Implicatives & factives We can specify relation generated by DEL or INS of each signature Room for variation w.r.t. infinitives, complementizers, passivation, etc. Some more intuitive when negated: he didn’t hesitate to ask | he didn’t ask Doesn’t cover factives, which involve presuppositions — see dissertation
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Putting it all together • Find a sequence of edits e1, …, en which transforms p into h. Define x0 = p, xn = h, and xi = ei(xi–1) for i [1, n]. • For each atomic edit ei: • Determine the lexical entailment relation (ei). • Project (ei) upward through the semantic composition tree of expression xi–1 to find the atomic entailment relation (xi–1, xi) • Join atomic entailment relations across the sequence of edits:(p, h) = (x0, xn) = (x0, x1) ⋈ … ⋈ (xi–1, xi) ⋈ … ⋈ (xn–1, xn) Limitations: need to find appropriate edit sequence connecting p and h;tendency of ⋈ operation toward less-informative entailment relations; lack of general mechanism for combining multiple premises Less deductive power than FOL. Can’t handle e.g. de Morgan’s Laws.
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion An example P The doctor didn’t hesitate to recommend Prozac. H The doctor recommended medication. yes ‿ | | ^ ^ ⊏ ⊏ ⊏ ⊏ yes
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Different edit orders? Intermediate steps may vary; final result is typically (though not necessarily) the same
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Outline • Introduction • Alignment for NLI • A theory of entailment relations • A theory of compositional entailment • The NatLog system • Conclusion
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The NatLog system NLI problem next slide linguistic analysis 1 from outside sources alignment 2 core of systemcovered shortly lexical entailment classification 3 entailment projection 4 straightforwardnot covered further straightforwardnot covered further entailment joining 5 prediction
refuse without JimmyDean move blue jeans Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion + + + – – – + + Stage 1: Linguistic analysis • Tokenize & parse input sentences (future: & NER & coref & …) • Identify items w/ special projectivity & determine scope • Problem: PTB-style parse tree semantic structure! S category: –/o implicatives examples: refuse, forbid, prohibit, … scope: S complement pattern: __ > (/VB.*/ > VP $. S=arg) projectivity: {:, ⊏:⊐, ⊐:⊏, ^:|, |:#, _:#, #:#} VP S VP VP PP NP NP NNP NNP VBD TO VB IN JJ NNS Jimmy Dean refused to move without blue jeans • Solution: specify scope in PTB trees using Tregex [Levy & Andrew 06]
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Stage 3: Lexical entailment classification • Goal: predict entailment relation for each edit, based solely on lexical features, independent of context • Approach: use lexical resources & machine learning • Feature representation: • WordNet features: synonymy (), hyponymy (⊏/⊐), antonymy (|) • Other relatedness features: Jiang-Conrath (WN-based), NomBank • Fallback: string similarity (based on Levenshtein edit distance) • Also lexical category, quantifier category, implication signature • Decision tree classifier • Trained on 2,449 hand-annotated lexical entailment problems • E.g., SUB(gun, weapon): ⊏, SUB(big, small): |, DEL(often): ⊏
Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion The FraCaS test suite • FraCaS: a project in computational semantics [Cooper et al. 96] • 346 “textbook” examples of NLI problems • 3 possible answers: yes, no, unknown (not balanced!) • 55% single-premise, 45% multi-premise (excluded)
27% error reduction Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion Results on FraCaS
27% error reduction in largest category, all but one correct high accuracy in sections most amenable to natural logic Introduction • Alignment for NLI • Entailment relations • Compositional entailment • The NatLog system • Conclusion high precision even outsideareas of expertise Results on FraCaS