Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Markov Logic Networks:A Step Towards a Unified Theory of Learning and Cognition Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner

One Algorithm • Observation:The cortex has the samearchitecture throughout • Hypothesis: A singlelearning/inferencealgorithm underliesall cognition • Let’s discover it!

The Neuroscience Approach • Map the brain and figure out how it works • Problem: We don’t know nearly enough

The Engineering Approach • Pick a task (e.g., object recognition) • Figure out how to solve it • Generalize to other tasks • Problem: Any one task is too impoverished

The Foundational Approach • Consider all tasks the brain does • Figure out what they have in common • Formalize, test and repeat • Advantage: Plenty of clues • Where to start? The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein

Recurring Themes • Noise, uncertainty, incomplete information→ Probability / Graphical models • Complexity, many objects & relations→ First-order logic

Examples

Research Plan • Unify graphical models and first-order logic • Develop learning and inference algorithms • Apply to wide variety of problems • This talk: Markov logic networks

Weight of formula i No. of true instances of formula i in x Markov Logic Networks • MLN = Set of 1st-order formulas with weights • Formula = Feature template (Vars→Objects) • E.g., Ising model: • Most graphical models are special cases • First-order logic is infinite-weight limit Up(x) ^ Neighbor(x,y) => Up(y)

MLN Algorithms:The First Three Generations

Weighted Satisfiability • SAT: Find truth assignment that makes allformulas (clauses) true • Huge amount of research on this problem • State of the art: Millions of vars/clauses in minutes • MaxSAT: Make as many clauses true as possible • Weighted MaxSAT: Clauses have weights; maximize satisfied weight • MAP inference in MLNs is just weighted MaxSAT • Best current solver: MaxWalkSAT

MC-SAT • Deterministic dependences break MCMC • In practice, even strong probabilistic ones do • Swendsen-Wang: • Introduce aux. vars. u to represent constraints among x • Alternately sample u | x and x | u. • But Swendsen-Wang only works for Ising models • MC-SAT: Generalize S-W to arbitrary clauses • Uses SAT solver to sample x | u. • Orders of magnitude faster than Gibbs sampling, etc.

Lifted Inference • Consider belief propagation (BP) • Often in large problems, many nodes are interchangeable:They send and receive the same messages throughout BP • Basic idea: Group them into supernodes, forming lifted network • Smaller network → Faster inference • Akin to resolution in first-order logic

Belief Propagation Features (f) Nodes (x)

Lifted Belief Propagation Features (f) Nodes (x)

Lifted Belief Propagation , : Functions of edge counts   Features (f) Nodes (x)

Weight Learning • Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results • Voted perceptron:Gradient descent + MAP inference • Problem: Multiple modes • Not alleviated by contrastive divergence • Alleviated by MC-SAT • Start each MC-SAT run at previous end state

Weight Learning (contd.) • Problem: Extreme ill-conditioning • Solvable by quasi-Newton, conjugate gradient, etc. • But line searches require exact inference • Stochastic gradient not applicable becausedata not i.i.d. • Solution: Scaled conjugate gradient • Use Hessian to choose step size • Compute quadratic form inside MC-SAT • Use inverse diagonal Hessian as preconditioner

Structure Learning • Standard inductive logic programming optimizesthe wrong thing • But can be used to overgenerate for L1 pruning • Our approach:ILP + Pseudo-likelihood + Structure priors • For each candidate structure change:Start from current weights & relax convergence • Use subsampling to compute sufficient statistics • Search methods: Beam, shortest-first, etc.

Applications to Date Natural language processing Information extraction Entity resolution Link prediction Collective classification Social network analysis Robot mapping Activity recognition Scene analysis Computational biology Probabilistic Cyc Personal assistants Etc.

Unsupervised Semantic Parsing Goal • Microsoft buys Powerset. • BUYS(MICROSOFT,POWERSET) Challenge Microsoft buysPowerset Microsoft acquiressemantic search engine Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase of Powerset, … Recursively cluster expressions composed of similar subexpressions USP Evaluation Extract knowledge from biomedical abstracts and answer questions Substantially outperforms state of the art Three times as many correct answers; accuracy 88%

Research Directions • Compact representations • Deep architectures • Boolean decision diagrams • Arithmetic circuits • Unified inference procedure • Learning MLNs with many latent variables • Tighter integration of learning and inference • End-to-end NLP system • Complete agent

Resources • Open-source software/Web site: Alchemy • Learning and inference algorithms • Tutorials, manuals, etc. • MLNs, datasets, etc. • Publications • Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009. alchemy.cs.washington.edu

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Presentation Transcript

Markov Logic and Deep Networks

Markov Logic Networks: A Unified Approach To Language Processing

Jakobsons Grand Unified Theory of Linguistic Cognition

Learning, Logic, and Probability: A Unified View

Towards a Unified Theory of Operational and Axiomatic Semantics

Practi Replication Towards a Unified Theory of Replication

Online Structure Learning for Markov Logic Networks

Markov Logic Networks

Learning Markov Logic Networks with Many Descriptive Attributes

Markov Logic Networks

Learning Markov Logic Networks Using Structural Motifs

Boosting Markov Logic Networks

Learning the Structure of Markov Logic Networks

Max-Margin Weight Learning for Markov Logic Networks

Learning the Structure of Markov Logic Networks

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Discriminative Training of Markov Logic Networks

Discriminative Structure and Parameter Learning for Markov Logic Networks

Discriminative Structure and Parameter Learning for Markov Logic Networks

Learning the Structure of Markov Logic Networks

Discriminative Training of Markov Logic Networks