1 / 158

Markov Logic in Natural Language Processing

Markov Logic in Natural Language Processing. Hoifung Poon Dept. of Computer Science & Eng. University of Washington. Overview. Motivation Foundational areas Markov logic NLP applications Basics Supervised learning Unsupervised learning. Languages Are Structural. governments

sandra_john
Download Presentation

Markov Logic in Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

  2. Overview • Motivation • Foundational areas • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  3. Languages Are Structural governments lm$pxtm (according to their families)

  4. Languages Are Structural S govern-ment-s l-m$px-t-m (according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase

  5. Languages Are Structural S govern-ment-s l-m$px-t-m (according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase

  6. Languages Are Structural • Objects are not just feature vectors • They have parts and subparts • Which have relations with each other • They can be trees, graphs, etc. • Objects are seldom i.i.d.(independent and identically distributed) • They exhibit local and global dependencies • They form class hierarchies (with multiple inheritance) • Objects’ properties depend on those of related objects • Deeply interwoven with knowledge

  7. First-Order Logic • Main theoretical foundation of computer science • General language for describing complex structures and knowledge • Trees, graphs, dependencies, hierarchies, etc. easily expressed • Inference algorithms (satisfiability testing, theorem proving, etc.)

  8. Languages Are Statistical Microsoft buysPowerset Microsoft acquires Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase ofPowerset, … …… I saw the man with the telescope NP I sawthe man with the telescope NP ADVP I sawthe manwith the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … G. W. Bush …… …… Laura Bush…… Mrs. Bush …… Which one? London PERSON or LOCATION?

  9. Languages Are Statistical • Languages are ambiguous • Our information is always incomplete • We need to model correlations • Our predictions are uncertain • Statistics provides the tools to handle this

  10. Probabilistic Graphical Models • Mixture models • Hidden Markov models • Bayesian networks • Markov random fields • Maximum entropy models • Conditional random fields • Etc.

  11. The Problem • Logic is deterministic, requires manual coding • Statistical models assume i.i.d. data,objects = feature vectors • Historically, statistical and logical NLPhave been pursued separately • We need to unify the two! • Burgeoning field in machine learning: Statistical relational learning

  12. Costs and Benefits ofStatistical Relational Learning • Benefits • Better predictive accuracy • Better understanding of domains • Enable learning with less or no labeled data • Costs • Learning is much harder • Inference becomes a crucial issue • Greater complexity for user

  13. Progress to Date • Probabilistic logic [Nilsson, 1986] • Statistics and beliefs [Halpern, 1990] • Knowledge-based model construction[Wellman et al., 1992] • Stochastic logic programs [Muggleton, 1996] • Probabilistic relational models [Friedman et al., 1999] • Relational Markov networks [Taskar et al., 2002] • Etc. • This talk: Markov logic [Domingos & Lowd, 2009]

  14. Markov Logic: A Unifying Framework • Probabilistic graphical models andfirst-order logic are special cases • Unified inference and learning algorithms • Easy-to-use software: Alchemy • Broad applicability • Goal of this tutorial:Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications

  15. Overview • Motivation • Foundational areas • Probabilistic inference • Statistical learning • Logical inference • Inductive logic programming • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  16. Markov Networks Smoking Cancer • Undirected graphical models Asthma Cough • Potential functions defined over cliques

  17. Markov Networks Smoking Cancer • Undirected graphical models Asthma Cough • Log-linear model: Weight of Feature i Feature i

  18. Markov Nets vs. Bayes Nets

  19. Inference in Markov Networks • Goal: compute marginals & conditionals of • Exact inference is #P-complete • Conditioning on Markov blanket is easy: • Gibbs sampling exploits this

  20. MCMC: Gibbs Sampling state← random truth assignment fori ←1tonum-samplesdo for eachvariable x sample x according to P(x|neighbors(x)) state←state with new value of x P(F) ← fraction of states in which F is true

  21. Other Inference Methods • Belief propagation (sum-product) • Mean field / Variational approximations

  22. MAP/MPE Inference • Goal: Find most likely state of world given evidence Query Evidence

  23. MAP Inference Algorithms • Iterated conditional modes • Simulated annealing • Graph cuts • Belief propagation (max-product) • LP relaxation

  24. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  25. Generative Weight Learning • Maximize likelihood • Use gradient ascent or L-BFGS • No local maxima • Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

  26. Pseudo-Likelihood • Likelihood of each variable given its neighbors in the data • Does not require inference at each step • Widely used in vision, spatial statistics, etc. • But PL parameters may not work well forlong inference chains

  27. Discriminative Weight Learning • Maximize conditional likelihood of query (y) given evidence (x) • Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to model

  28. Voted Perceptron • Originally proposed for training HMMs discriminatively • Assumes network is linear chain • Can be generalized to arbitrary networks wi← 0 fort←1toT do yMAP← Viterbi(x) wi←wi+ η[counti(yData) – counti(yMAP)] return wi / T

  29. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  30. First-Order Logic • Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y) • Literal: Predicate or its negation • Clause: Disjunction of literals • Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob) • World (model, interpretation):Assignment of truth values to all ground predicates

  31. Inference in First-Order Logic • Traditionally done by theorem proving(e.g.: Prolog) • Propositionalization followed by model checking turns out to be faster (often by a lot) • Propositionalization:Create all ground atoms and clauses • Model checking: Satisfiability testing • Two main approaches: • Backtracking (e.g.: DPLL) • Stochastic local search (e.g.: WalkSAT)

  32. Satisfiability • Input: Set of clauses(Convert KB to conjunctive normal form (CNF)) • Output: Truth assignment that satisfies all clauses, or failure • The paradigmatic NP-complete problem • Solution: Search • Key point:Most SAT problems are actually easy • Hard region: Narrow range of#Clauses/#Variables

  33. Stochastic Local Search • Uses complete assignments instead of partial • Start with random state • Flip variables in unsatisfied clauses • Hill-climbing: Minimize # unsatisfied clauses • Avoid local minima: Random flips • Multiple restarts

  34. The WalkSAT Algorithm fori←1 to max-triesdo solution = random truth assignment for j←1tomax-flipsdo if all clauses satisfiedthen return solution c←random unsatisfied clause with probabilityp flip a random variable inc else flip variable in c that maximizes # satisfied clauses returnfailure

  35. Overview Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Markov logic NLP applications Basics Supervised learning Unsupervised learning

  36. Rule Induction • Given: Set of positive and negative examples of some concept • Example:(x1, x2, … , xn, y) • y:concept (Boolean) • x1, x2, … , xn:attributes (assume Boolean) • Goal: Induce a set of rules that cover all positive examples and no negative ones • Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation) • Same as Horn clause: Body  Head • Rulercovers example x iff xsatisfies body of r • Eval(r): Accuracy, info gain, coverage, support, etc.

  37. Learning a Single Rule head ← y body←Ø repeat for eachliteral x rx← r with x added to body Eval(rx) body ← body ^ best x untilno x improves Eval(r) returnr

  38. Learning a Set of Rules R ← Ø S ← examples repeat learn a single ruler R ← R U { r } S ← S − positive examples covered by r untilS = Ø returnR

  39. First-Order Rule Induction • y and xiare now predicates with argumentsE.g.: y is Ancestor(x,y), xi is Parent(x,y) • Literals to add are predicates or their negations • Literal to add must include at least one variablealready appearing in rule • Adding a literal changes # groundings of ruleE.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y) • Eval(r) must take this into accountE.g.: Multiply by # positive groundings of rule still covered after adding literal

  40. Overview • Motivation • Foundational areas • Markov logic • NLP applications • Basics • Supervised learning • Unsupervised learning

  41. Markov Logic • Syntax: Weighted first-order formulas • Semantics: Feature templates for Markov networks • Intuition:Soften logical constraints • Give each formula a weight(Higher weight  Stronger constraint)

  42. Example: Coreference Resolution Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……

  43. Example: Coreference Resolution

  44. Example: Coreference Resolution

  45. Example: Coreference Resolution Two mention constants: A and B Apposition(A,B) Head(A,“President”) Head(B,“President”) MentionOf(A,Obama) MentionOf(B,Obama) Head(A,“Obama”) Head(B,“Obama”) Apposition(B,A)

  46. Markov Logic Networks • MLN is template for ground Markov nets • Probability of a world x: • Typed variables and constants greatly reduce size of ground Markov net • Functions, existential quantifiers, etc. • Can handle infinite domains [Singla & Domingos, 2007]and continuous domains [Wang & Domingos, 2008] Weight of formula i No. of true groundings of formula i in x

  47. Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields Obtained by making all predicates zero-arity Markov logic allows objects to be interdependent (non-i.i.d.) Relation to Statistical Models

  48. Relation to First-Order Logic • Infinite weights  First-order logic • Satisfiable KB, positive weights Satisfying assignments = Modes of distribution • Markov logic allows contradictions between formulas

  49. MLN Algorithms:The First Three Generations

  50. MAP/MPE Inference • Problem: Find most likely state of world given evidence Query Evidence

More Related