570 likes | 582 Views
Global Inference and Learning Towards Natural Language Understanding. Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign. AAAI-06 Invited Talk. Nice to Meet You. Learning and Inference.
E N D
Global Inference and LearningTowards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign AAAI-06 Invited Talk
Learning and Inference • Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome. • (Learned) classifiers for different sub-problems • Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints. • Global inference for the best assignment to all variables of interest.
A process that maintains and updates a collection of propositions about the state of affairs. Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
How to Address Comprehension? (cartoon) It’s an Inference Problem Huge number of problems: Variability of Language Knowledge Acquisition Reasoning Patterns Map into a well defined language Use standard reasoning tools multiple levels of ambiguity make language precise Canonical Mapping? Underspecificity calls for “purposeful”, goal specific mapping Statistics: know a word by its neighbors Machine Learning Counting Classification Probabilistic models Clustering Structured Models
What we Know: Stand Alone Ambiguity Resolution Illinois’ bored of education board ...Nissan Car and truck plantis … …divide life into plant and animal kingdom (This Art) (can N) (will MD) (rust V) V,N,N The dog bit the kid. Hewas taken to a veterinarian a hospital Learn a function f: X Ythat maps observations in a domain to one of several categories or < Broad Coverage
A process that maintains and updates a collection of propositions about the state of affairs. Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
This Talk • Integrating Learning and Inference • Historical Perspective • Role of Learning • Global Inference over Classifiers • Semantic Parsing • Multiple levels of processing • Textual Entailment • Summary and Future Directions
Learning, Knowledge Representation, Reasoning • There has been a lot of work on these three topics in AI • But, mostly, • work on Learning, and • work on Knowledge Representation and Reasoning • Very little has been done on an integrative framework. • Inductive Logic Programming • Some work from the perspective of probabilistic AI • Some work from the perspective of Machine Learning (EBL) • Recent: Valiant’s Robust Logic • Learning to Reason
Learning to Reason [’94-’97]: Key Insights • An unified framework to study Learning, Knowledge Representation and Reasoning • The goal is to Reason (deduction; abduction - best explanation) • Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world. • Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning. • Feedback to learning is given by the reasoning stage. • There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning. [Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99 Learning to Plan: Khardon’99]
Learning Knowledge Representation KB W Learning to Reason • Interaction with the World • Knowledge representation is: • - Chosen to facilitate inference • - Learned by interaction with the world • Performance is measured with respect to the world, on a reasoning task Reasoning Task
L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF
f L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF • Learning to Reason is easy when Reasoning is hard q
KB(f) L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF • Learning to Reason is easy when Reasoning is hard W q
KB(f) f L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF • Learning to Reason is easy when Reasoning is hard • Learning to Reason is easy when Learning is hard L R W q L W
KB(f) KB(f) L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF • Learning to Reason is easy when Reasoning is hard • Learning to Reason is easy when Learning is hard L R W q L R q W
KB(f) KB(f) Other studies on non-monotonic reasoning as learning, etc. L2R: Computational Advantages • Deduction (Khardon,Roth 94,97) • f q, fCNF, q MonCNF • Learning to Reason is easy when Reasoning is hard • Learning to Reason is easy when Learning is hard • No Magic Learn a different representation for f, one that allows efficient reasoning L R W q Learn a different function, which only approximates f, but is sufficient for exact reasoning. L R q W
Learning to Reason [’94-’97]: Relevant? • An unified framework to study Learning, Knowledge Representation and Reasoning • The goal is to Reason (deduction; abduction - best explanation) • Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world. • Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning. • Feedback to learning is given by the reasoning stage. • There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning. [Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99 Learning to Plan: Khardon’99] How?
A process that maintains and updates a collection of propositions about the state of affairs. Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
Today’s Research Issues It’s an Inference Problem What is the role of Learning? • Learning serves to support abstraction • A context sensitive operation that is done at multiple levels • From names (Mr. Robin, Christopher Robin) to Relations (wrote, author) and concepts • Learning serves to generate the vocabulary • over which reasoning is possible (Part-of-speech; a subject-of,…) • Knowledge Acquisition; tuning; memory,…. • Learning in the context of a reasoning system • Training with an Inference Mechanism • Feedback is done at the inference level, not at the single classifier level • Labeling using Inference
This Talk • Integrating Learning and Inference • Historical Perspective • Role of Learning • Global Inference over Classifiers • Semantic Parsing • Multiple levels of processing • Textual Entailment • Summary and Future Directions
meeting person participant location time nationality affiliation country organization city date month(April) year(2001) name(Iraq) name(Prague) What I will not talk about: Knowledge Representation • A unified representation that is used • as an input to learning processes, and • as an output of learning processes • …… Specifically, we use: • An abstract representation that is centered around a semantic parse (predicate-argument representation) augmented by additional information. • Formalized as a hierarchical concept graph (Description Logic inspired) • Feature Description Logic [Cumby&Roth’00, 02, 03] Mohammed Atta met with an Iraqi intelligence agent in Prague in April 2001.
Inference and Learning • Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome. • Learned classifiers for different sub-problems • Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints. • Global inference for the best assignment to all variables of interest. How to induce a predicate argument representation of a sentence. How to use inference methods over learned outcomes. How to use declarative information over/along with learned information.
Semantic Role Labeling I left my pearls to my daughter in my will . [I]A0left[my pearls]A1[to my daughter]A2[in my will]AM-LOC . • A0 Leaver • A1 Things left • A2 Benefactor • AM-LOC Location I left my pearls to my daughter in my will . • Special Case (structure output problem): here, all the data is available at one time; in general, classifiers might be learned from different sources, at different times, at different contexts. • Implications on training paradigms Overlapping arguments If A2 is present, A1 must also be present.
C(y2,y3,y6,y7,y8) C(y1,y4) y1 y2 y3 y4 y5 y6 y7 y8 (+ WC) Problem Setting • Random Variables Y: • Conditional DistributionsP (learned by classifiers) • Constraints C– any Boolean function defined on partial assignments (possibly: + weights W ) • Goal: Find the “best” assignment • The assignment that achieves the highest global accuracy. • This is an Integer Programming Problem Y*=argmaxYPY subject to constraints C
A1: utterance C-A1: utterance A1: thing left A0 : leaver A0 : leaver A2: benefactor The pearls whichIleftto my daughter-in-laware fake. The pearls, Isaid, were left to my daughter-in-law. Ileftmy pearlsto my daughter-in-lawin my will. R-A1 A0 : sayer A2: benefactor A1: thing left AM-LOC Semantic Role Labeling (1/2) • For each verb in a sentence • Identify all constituents that fill a semantic role • Determine their roles • Core Arguments, e.g., Agent, Patient or Instrument • Their adjuncts, e.g., Locative, Temporal or Manner Who did what to whom, when, where, why,…
Semantic Role Labeling (2/2) • PropBank [Palmer et. al. 05] provides a large human-annotated corpus of semantic verb-argument relations. • It adds a layer of generic semantic labels to Penn Tree Bank II. • (Almost) all the labels are on the constituents of the parse trees. • Core arguments: A0-A5 and AA • different semantics for each verb • specified in the PropBank Frame files • 13 types of adjuncts labeled as AM-arg • where arg specifies the adjunct type
I left my nice pearls to her I left my nice pearls to her I left my nice pearls to her I left my nice pearls to her [ [ [ [ [ [ [ [ [ [ ] ] ] ] ] ] ] ] ] ] Identify Vocabulary Algorithmic Approach candidate arguments • Identify argument candidates • Pruning [Xue&Palmer, EMNLP’04] • Argument Identifier • Binary classification (SNoW) • Classify argument candidates • Argument Classifier • Multi-class classification (SNoW) • Inference • Use the estimated probability distribution given by the argument classifier • Use structural and linguistic constraints • Infer the optimal global output Inference over (old and new) Vocabulary Ileftmy nice pearlsto her
I left my nice pearls to her I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] Argument Identification & Classification • Both argument identifier and argument classifier are trained phrase-based classifiers. • Features (some examples) • voice, phrase type, head word, path, chunk, chunk pattern, etc. [some make use of a full syntactic parse] • Learning Algorithm – SNoW • Sparse network of linear functions • weights learned by regularized Winnow multiplicative update rule • Probability conversion is done via softmax pi = exp{acti}/j exp{actj}
I left my nice pearls to her Inference • The output of the argument classifier often violates some constraints, especially when the sentence is long. • Finding the best legitimate output is formalized as an optimization problem and solved via Integer Linear Programming. [Punyakanok et. al 04, Roth & Yih 04] • Input: • The probability estimation (by the argument classifier) • Structural and linguistic constraints • Allows incorporating expressive (non-sequential) constraints on the variables (the arguments types).
Integer Linear Programming (ILP) Maximize: Subject to
Integer Linear Programming Inference • For each argument ai • Set up a Boolean variable: ai,tindicating whether ai is classified as t • Goal is to maximize • i score(ai = t ) ai,t • Subject to the (linear) constraints • Any Boolean constraints can be encoded as linear constraint(s). • If score(ai = t ) = P(ai = t ), the objective is to find the assignment that maximizes the expected number of arguments that are correct and satisfies the constraints.
0.30.20.20.3 0.60.00.00.4 0.10.30.50.1 0.10.20.30.4 Cost = 0.3 + 0.4 + 0.3 + 0.4 = 1.4 BlueRed & N-O Cost = 0.3 + 0.4 + 0.5 + 0.4 = 1.6 Non-Overlapping Cost = 0.3 + 0.6 + 0.5 + 0.4 = 1.8 Independent Max Inference • Maximize expected number correct • T* = argmaxT i P( ai = ti ) • Subject to some constraints • Structural and Linguistic (R-A1A1) • Solved with Integer Learning Programming I left my nice pearls to her Ileftmy nice pearlsto her
Constraints Any Boolean rule can be encoded as a linear constraint. • No duplicate argument classes aPOTARG x{a = A0} 1 • R-ARG a2POTARG , aPOTARG x{a = A0}x{a2 = R-A0} • C-ARG • a2POTARG , (aPOTARG) (a is before a2 )x{a = A0}x{a2 = C-A0} • Many other possible constraints: • Unique labels • No overlapping or embedding • Relations between number of arguments • If verb is of type A, no argument of type B If there is an R-ARG phrase, there is an ARG Phrase If there is an C-ARG phrase, there is an ARG before it Universally quantified rules
Semantic Parsing: Summary I • This approach produces a very good semantic parser. • Top ranked system in CoNLL’05 shared task: • Key difference is the Inference • Easy and fast: ~1 Sentence/Second (using Xpress-MP) • A lot of room for improvement (additional constraints) • Demo available http://L2R.cs.uiuc.edu/~cogcomp • Significant also in enabling knowledge acquisition • So far, shown the use of only declarative (deterministic) constraints. • In fact, this approach can be used both with statistical and declarative constraints.
y1 y2 y3 y4 y5 y x1 x2 x3 x4 x5 x s t A A A A A B B B B B C C C C C ILP as a Unified Algorithmic Scheme • Consider a common model for sequential inference: HMM/CRF • Inference in this model is done via the Viterbi Algorithm. • Viterbi is a special case of the Linear Programming based Inference. • Viterbi is a shortest path problem, which is a LP, with a canonical matrix that is totally unimodular. Therefore, you can get integrality constraints for free. • One can now incorporate non-sequential/expressive/declarative constraints by modifying this canonical matrix • The extension reduces to a polynomial scheme under some conditions (e.g., when constraints are sequential, when the solution space does not change, etc.) • Not necessarily increases complexity and very efficient in practice [Roth&Yih, ICML’05]
Integer Linear Programming Inference - Summary • An Inference method for the “best explanation”, used here to induce a semantic representation of a sentence. • A general Information Integration framework. • Allows expressive constraints • Any Boolean rule can be represented by a set of linear (in)equalities • Combining acquired (statistical) constraints with declarative constraints • Start with shortest path matrix and constraints • Add new constraints to the basic integer linear program. • Solved using off-the-shelf packages • If the additional constraints don’t change the solution, LP is enough • Otherwise, the computational time depends on sparsity; fast in practice • Demo available http://L2R.cs.uiuc.edu/~cogcomp
This Talk • Integrating Learning and Inference • Historical Perspective • Role of Learning • Global Inference over Classifiers • Semantic Parsing • Multiple levels of processing • Textual Entailment • Summary and Future Directions
Inference and Learning • Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome. • So far, this was a single stage process. • Learn (acquire a new vocabulary) and • Run inference over it to guarantee the coherency of the outcome. • Is that it? • Of course, this isn’t sufficient. • The process of learning and Inference needs to be done in phases. It’s turtles all the way down…
Pipeline • Vocabulary is generated in phases • Left to Right processing of sentences is also a pipeline process Raw Data • Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions. • Leads to propagation of errors • Occasionally, later stage problems are easier but upstream mistakes will not be corrected. • There are good reasons for pipelining decisions • Global inference over the outcomes of different levels can be used to break away from this paradigm. [between pipeline & fully global] • Allows a flexible way to incorporate linguistic and structural constraints. POS Tagging Phrases Semantic Entities Relations Parsing WSD Semantic Role Labeling
person person Entities and Relations: Information Integration J.V. Oswald was murdered at JFK after his assassin, K. F. Johns… Identify: Kill (X, Y) J.V. Oswald was murdered at JFK after his assassin, K. F. Johns… location Some knowledge (classifiers) may be known in advance Some constraints may be available only at decision time • Identify named entities • Identify relations between entities • Exploit mutual dependencies between named entities and relation to yield a coherent global detection.[Roth & Yih, COLING’02;CoNLL’04]
This Talk • Integrating Learning and Inference • Historical Perspective • Role of Learning • Global Inference over Classifiers • Semantic Parsing • Multiple levels of processing • Textual Entailment • Summary and Future Directions
A process that maintains and updates a collection of propositions about the state of affairs. Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.
By “semantically entailed” we mean: most people would agree that one sentence implies the other. Simply – making plausible inferences Textual Entailment Phrasal verb paraphrasing [Connor&Roth’06] • Given: Q: Who acquired Overture? • Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year. Entity matching [Li et. al, AAAI’04, NAACL’04] Semantic Role Labeling Entails Subsumed by Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc. last year Yahoo acquired Overture Overture is a search company Google is a search company Google owns Overture ……….
Discussing Textual Entailment • Requires an inference process that makes use of a large number of learned (and knowledge-based) operators. • A sound approach for determining whether a statement of interest holds in a given sentence. [Braz et. al, AAAI05] • A pair (Sentence, hypothesis) is transformed into a simpler pair, in an entailment preserving manner. • Constrained Optimization formulation, over a large number of learned operators. Aimed at the best (simplest) mapping between predicate-argument representations. • Inference is purposeful: No canonical representation, but rather reasoning on sentences transformation depends on the hypothesis. • What is shown next is a proof. • At any stage, a large number of operators are entertained, some do not fire, some lead nowhere. • This is a path through the optimization process, that leads to a justifiable (and explainable) answer.
Sample Entailment Pair S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. Does ‘T’ follow from ‘S’?
OPERATOR 1: Phrasal Verb Replace phrasal verbs with an equivalent single word verb S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 2: Coreference Resolution Replace pronouns/possessive pronouns with the entity to which they refer S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products.
S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governmentsfinally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments, involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 3: Focus of Attention Remove segments of a sentence that do not appear to be necessary; may allow more accurate annotation of remaining words S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products.
involved Offers supplies … by individual … offered Individual … supplies … OPERATOR 4: Nominalization Promotion Replace a verb that does not express a useful/meaningful relationship with a nominalization in one of its arguments S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Individual European governments offered supplies of crude or refined oil products. S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T: Offers by individual European governments involved supplies of crude or refined oil products. S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. Requires semantic role labeling (for noun predicates)