Learning to “ Read Between the Lines ” using Bayesian Logic Programs

Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July 2012

Information Extraction • Information extraction (IE) systems extract factual informationthat occurs in text [Cowie and Lenhert, 1996; Sarawagi, 2008] • Natural language text is typically “incomplete” • Commonsense information is not explicitly stated • Easily inferred facts are omitted from the text • Human readers use commonsense knowledge and “read between the lines” to infer implicit information • IE systems have no access to commonsense knowledge and hence cannot infer implicit information

Example Natural language text “Barack Obama is the President of the United States of America.” Query “Barack Obama is the citizen of what country?” IE systems cannotanswer this query since citizenship information is not explicitly stated!

Objective • Infer implicit facts from explicitly stated information • Extract explicitly stated facts using an IE system • Learn common sense knowledge in the form of logical rules to deduceadditional facts • Employ models from statistical relational learning (SRL) that allow probabilities to be estimated using well-founded probabilistic graphical models

Related Work • Learning propositional rules [Nahm and Mooney, 2000] • Learn propositional rules from the output of an IE system on computer-related job postings • Perform logical deduction to infer new facts • Purely logical deduction is brittle • Cannot assign probabilities or confidence estimates to inferences

Related Work • Learning first-order rules • Logical deduction using probabilistic rules[Carlson et al., 2010; Doppa et al., 2010] • Modify existing rule learners like FOIL and FARMER to learn probabilistic rules • Probabilities are not computed using well-founded probabilistic graphical models • Use Markov Logic Networks (MLNs) [Domingos and Lowd, 2009] based approaches to infer additional facts [Schoenmackers et al., 2010; Sorower et al., 2011] • Grounding process could result in intractably large networks for large domains

Related Work • Learning for Textual Entailment [Lin and Pantel, 2001; Yates and Etzioni, 2007; Berant et al., 2011] • Textual entailment rules have a single antecedent in the body of the rule • Approaches from statistical relational learning have not been applied so far • Do not use extractions from a traditional IE system to learn rules

Our Approach • Use an off-the shelf IE system to extract facts • Learn commonsense knowledge from the extracted facts in the form of probabilistic first-order-rules • Infer additional facts based on the learned rules using Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001]

. . . System Architecture . . . Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA. . . . . . . nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Training Documents Extracted Facts Information Extractor (IBM SIRE) Inductive Logic Programming (LIME) First-Order Logical Rules BLP Weight Learner (version of EM) nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) Test Document Extractions Bayesian Logic Program (BLP) BLP Inference Engine hasCitizenship(A,B) | nationState(B) , isLedBy(B,A) .9 hasCitizenship(A,B) | nationState(B) , employs(B,A) .6 nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) employs(malaysian,mahatir-mohamad) Inferences with probabilities hasCitizenship(mahathir-mohamad, malaysian) 0.75

Bayesian Logic Programs[Kersting and De Raedt, 2001] • Set of Bayesian clauses a | a1,a2,....,an • Definite clauses in first-order logic, universally quantified • Head of the clause - a • Body of the clause - a1, a2, …, an • Associated conditional probability table (CPT) • P(head | body) • Bayesian predicates a, a1, a2, …, an have finite domains • Combining rule like noisy-or for mapping multiple CPTs into a single CPT • Given a set of Bayesian clauses and a query, SLD resolution is used to construct ground Bayesian networks for probabilistic inference

Why BLPs? • Pure logical deduction is brittle and results in many undifferentiated inferences • Inference in BLPs is probabilistic, i.e. inferences are assigned probabilities • Probabilities can be used to select only high-confidence inferences • Efficient grounding mechanism in BLPs enables our approach to scale

Inductive Logic Programming (ILP)for learning first-order rules Positive instances hasCitizenship (BarackObama, USA) hasCitizenship (GeorgeBush, USA) hasCitizenship (IndiraGandhi,India) . . Target relation hasCitizenship(X,Y) ILP Rule Learner Rules nationState(Y) ∧ isLedBy(Y,X)  hasCitizenship (X,Y) . . Negative instances hasCitizenship (BarackObama, India) hasCitizenship (GeorgeBush, India) hasCitizenship (IndiraGandhi,USA) . . Generated using closed- world assumption KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India) . .

Inference using BLPs Test document “MalaysianPrime Minister Mahathir Mohamad Wednesdayannounced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.” Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) employs(malaysian,mahatir-mohamad) Learned rules nationState(B) ∧isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B)

Logical Inference in BLPs Rule 1 nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(malaysian) isLedBy(malaysian,mahathir-mohamad) hasCitizenship(mahathir-mohamad, malaysian)

Logical Inference in BLPs Rule 2 nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) nationState(malaysian) employs(malaysian,mahathir-mohamad) hasCitizenship(mahathir-mohamad, malaysian)

Probabilistic inference in BLPs employs (malaysian, mahathir-mohamad) nationState (malaysian) isLedBy (malaysian, mahathir-mohamad) Logical And Logical And dummy2 dummy1 Noisy Or hasCitizenship (mahathir-mohamad, malaysian) Marginal Probability ??

Sample rules learned governmentOrganization(A) ∧ employs(A,B)  hasMember(A,B) eventLocation(A,B) ∧ bombing(A)  thingPhysicallyDamage(A,B) isLedBy(A,B)  hasMemberPerson(A,B)

Experimental Evaluation • Data • DARPA’s intelligence community (IC) data set from the Machine Reading Project (MRP) • Consists of news articles on politics, terrorism, and other international events • 10,000 documents in total • Perform 10-fold cross validation

Experimental Evaluation • Learning first-order rules using LIME [McCreath and Sharma, 1998] • Learn rules for 13 target relations • Learn rules using both positive and negative instances and using only positive instances • Include all unique rules learned from different models • Learning BLP parameters • Learn noisy-or parameters using Expectation Maximization (EM) • Set priors to maximum likelihood estimates

Experimental Evaluation • Performance evaluation • Manually evaluated inferred facts from 40 documents, randomly selected from each test set • Compute two precision scores • Unadjusted (UA) – does not account for extractor’s mistakes • Adjusted (AD) – account for extractor’s mistakes • Rank inferences using marginal probabilities and evaluate top-n

Experimental Evaluation • Systems compared • BLP Learned Weights • Noisy-or parameters learned using online EM • BLP Manual Weights • Noisy-or parameters set to 0.9 • Logical Deduction • MLN Learned Weights • Learn weights using generative online weight learner • MLN Manual Weights • Assign a weight of 10 to all rules and MLE priors to all predicates

Unadjusted Precision

Adjusted Precision

Future Work • Improve the performance of weight learning for BLPs and MLNs • Learn parameters on larger data sets • Improve performance of MLNs • Use open-world assumption for learning • Add constraints required to prevent inference of facts like employs(a,a) • Specialize types that do not have strictly defined types • Develop an online rule learner that can learn rules from uncertain training data

Conclusions • Efficient learning of probabilistic first-order rules that represent common sense knowledge using extractions from an IE system • Inference of implicitly stated facts with high precision using BLPs • Superior performance of BLPs over purely logical deduction and MLNs

Questions??

Back Up

Results for Logical Deduction

Experimental Evaluation • Learning BLP parameters • Use logical-and model to combine evidence from the conjuncts in the body of the clause • Use noisy-ormodel to combine evidence from several ground rules that have the same head • Learn noisy-or parameters using Expectation Maximization (EM) • Set priors to maximum likelihood estimates

Learning to “ Read Between the Lines ” using Bayesian Logic Programs

Learning to “ Read Between the Lines ” using Bayesian Logic Programs

Presentation Transcript

Artificial Intelligence 14. Inductive Logic Programming

Bayesian Statistics and Belief Networks

Bayesian Learning

Bayesian Learning

Bayesian Statistics and Belief Networks

Bayesian Logic Programs for Plan Recognition and Machine Reading

Bayesian models of inductive learning

Learning How to Read a Map

Machine Learning CS 165B Spring 2012

Bayesian Statistics and Belief Networks

Multi Entity Bayesian Network

Applications of Markov Logic

Bayesian Learning

Chapter 7 Logic Instructions and Programs

CMSC 671 Fall 2005

High Speed Logic Transmission lines

Bayesian regularization of learning

Bayesian Classifiers

Abductive Plan Recognition By Extending Bayesian Logic Programs

Logic for Computer Security Protocols

Bayesian Network Structure Learning A Sequential Monte Carlo Approach