Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Markov Logic Networks:A Step Towards a Unified Theory of Learning and Cognition Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner

One Algorithm • Observation:The cortex has the samearchitecture throughout • Hypothesis: A singlelearning/inferencealgorithm underliesall cognition • Let’s discover it!

The Neuroscience Approach • Map the brain and figure out how it works • Problem: We don’t know nearly enough

The Engineering Approach • Pick a task (e.g., object recognition) • Figure out how to solve it • Generalize to other tasks • Problem: Any one task is too impoverished

The Foundational Approach • Consider all tasks the brain does • Figure out what they have in common • Formalize, test and repeat • Advantage: Plenty of clues • Where to start? The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein

Recurring Themes • Noise, uncertainty, incomplete information→ Probability / Graphical models • Complexity, many objects & relations→ First-order logic

Examples

Research Plan • Unify graphical models and first-order logic • Develop learning and inference algorithms • Apply to wide variety of problems • This talk: Markov logic networks

Weight of formula i No. of true instances of formula i in x Markov Logic Networks • MLN = Set of 1st-order formulas with weights • Formula = Feature template (Vars→Objects) • E.g., Ising model: • Most graphical models are special cases • First-order logic is infinite-weight limit Up(x) ^ Neighbor(x,y) => Up(y)

MLN Algorithms:The First Three Generations

Weighted Satisfiability • SAT: Find truth assignment that makes allformulas (clauses) true • Huge amount of research on this problem • State of the art: Millions of vars/clauses in minutes • MaxSAT: Make as many clauses true as possible • Weighted MaxSAT: Clauses have weights; maximize satisfied weight • MAP inference in MLNs is just weighted MaxSAT • Best current solver: MaxWalkSAT

MC-SAT • Deterministic dependences break MCMC • In practice, even strong probabilistic ones do • Swendsen-Wang: • Introduce aux. vars. u to represent constraints among x • Alternately sample u | x and x | u. • But Swendsen-Wang only works for Ising models • MC-SAT: Generalize S-W to arbitrary clauses • Uses SAT solver to sample x | u. • Orders of magnitude faster than Gibbs sampling, etc.

Lifted Inference • Consider belief propagation (BP) • Often in large problems, many nodes are interchangeable:They send and receive the same messages throughout BP • Basic idea: Group them into supernodes, forming lifted network • Smaller network → Faster inference • Akin to resolution in first-order logic

Belief Propagation Features (f) Nodes (x)

Lifted Belief Propagation Features (f) Nodes (x)

Lifted Belief Propagation , : Functions of edge counts   Features (f) Nodes (x)

Weight Learning • Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results • Voted perceptron:Gradient descent + MAP inference • Problem: Multiple modes • Not alleviated by contrastive divergence • Alleviated by MC-SAT • Start each MC-SAT run at previous end state

Weight Learning (contd.) • Problem: Extreme ill-conditioning • Solvable by quasi-Newton, conjugate gradient, etc. • But line searches require exact inference • Stochastic gradient not applicable becausedata not i.i.d. • Solution: Scaled conjugate gradient • Use Hessian to choose step size • Compute quadratic form inside MC-SAT • Use inverse diagonal Hessian as preconditioner

Structure Learning • Standard inductive logic programming optimizesthe wrong thing • But can be used to overgenerate for L1 pruning • Our approach:ILP + Pseudo-likelihood + Structure priors • For each candidate structure change:Start from current weights & relax convergence • Use subsampling to compute sufficient statistics • Search methods: Beam, shortest-first, etc.

Applications to Date Natural language processing Information extraction Entity resolution Link prediction Collective classification Social network analysis Robot mapping Activity recognition Scene analysis Computational biology Probabilistic Cyc Personal assistants Etc.

Unsupervised Semantic Parsing Goal • Microsoft buys Powerset. • BUYS(MICROSOFT,POWERSET) Challenge Microsoft buysPowerset Microsoft acquiressemantic search engine Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase of Powerset, … Recursively cluster expressions composed of similar subexpressions USP Evaluation Extract knowledge from biomedical abstracts and answer questions Substantially outperforms state of the art Three times as many correct answers; accuracy 88%

Research Directions • Compact representations • Deep architectures • Boolean decision diagrams • Arithmetic circuits • Unified inference procedure • Learning MLNs with many latent variables • Tighter integration of learning and inference • End-to-end NLP system • Complete agent

Resources • Open-source software/Web site: Alchemy • Learning and inference algorithms • Tutorials, manuals, etc. • MLNs, datasets, etc. • Publications • Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009. alchemy.cs.washington.edu

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Presentation Transcript

Hayward Unified

Chapter 2: Fundamentals of Decision Theory

Logic

Logic

Markov Decision Processes: A Survey

Tutorial: Error Theory and Adjustment of Networks

Markov Logic

Visual Cognition

Ch 8. Sequential logic design practices

Cognition and Emotion

Reinforcement Learning

Multi-Channel Wireless Networks: Theory to Practice

Feed-Forward Neural Networks

Cognition: AP Psychology

Theories of Development

Learning

Chapter 1

Reversible Computing Theory I: Reversible Logic Models

The Course Logic Programming ID2213

Support Vector Machines and Radial Basis Function Networks

What is Animal Cognition?