230 likes | 413 Views
Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition. Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner. One Algorithm.
E N D
Markov Logic Networks:A Step Towards a Unified Theory of Learning and Cognition Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner
One Algorithm • Observation:The cortex has the samearchitecture throughout • Hypothesis: A singlelearning/inferencealgorithm underliesall cognition • Let’s discover it!
The Neuroscience Approach • Map the brain and figure out how it works • Problem: We don’t know nearly enough
The Engineering Approach • Pick a task (e.g., object recognition) • Figure out how to solve it • Generalize to other tasks • Problem: Any one task is too impoverished
The Foundational Approach • Consider all tasks the brain does • Figure out what they have in common • Formalize, test and repeat • Advantage: Plenty of clues • Where to start? The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein
Recurring Themes • Noise, uncertainty, incomplete information→ Probability / Graphical models • Complexity, many objects & relations→ First-order logic
Research Plan • Unify graphical models and first-order logic • Develop learning and inference algorithms • Apply to wide variety of problems • This talk: Markov logic networks
Weight of formula i No. of true instances of formula i in x Markov Logic Networks • MLN = Set of 1st-order formulas with weights • Formula = Feature template (Vars→Objects) • E.g., Ising model: • Most graphical models are special cases • First-order logic is infinite-weight limit Up(x) ^ Neighbor(x,y) => Up(y)
Weighted Satisfiability • SAT: Find truth assignment that makes allformulas (clauses) true • Huge amount of research on this problem • State of the art: Millions of vars/clauses in minutes • MaxSAT: Make as many clauses true as possible • Weighted MaxSAT: Clauses have weights; maximize satisfied weight • MAP inference in MLNs is just weighted MaxSAT • Best current solver: MaxWalkSAT
MC-SAT • Deterministic dependences break MCMC • In practice, even strong probabilistic ones do • Swendsen-Wang: • Introduce aux. vars. u to represent constraints among x • Alternately sample u | x and x | u. • But Swendsen-Wang only works for Ising models • MC-SAT: Generalize S-W to arbitrary clauses • Uses SAT solver to sample x | u. • Orders of magnitude faster than Gibbs sampling, etc.
Lifted Inference • Consider belief propagation (BP) • Often in large problems, many nodes are interchangeable:They send and receive the same messages throughout BP • Basic idea: Group them into supernodes, forming lifted network • Smaller network → Faster inference • Akin to resolution in first-order logic
Belief Propagation Features (f) Nodes (x)
Lifted Belief Propagation Features (f) Nodes (x)
Lifted Belief Propagation , : Functions of edge counts Features (f) Nodes (x)
Weight Learning • Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results • Voted perceptron:Gradient descent + MAP inference • Problem: Multiple modes • Not alleviated by contrastive divergence • Alleviated by MC-SAT • Start each MC-SAT run at previous end state
Weight Learning (contd.) • Problem: Extreme ill-conditioning • Solvable by quasi-Newton, conjugate gradient, etc. • But line searches require exact inference • Stochastic gradient not applicable becausedata not i.i.d. • Solution: Scaled conjugate gradient • Use Hessian to choose step size • Compute quadratic form inside MC-SAT • Use inverse diagonal Hessian as preconditioner
Structure Learning • Standard inductive logic programming optimizesthe wrong thing • But can be used to overgenerate for L1 pruning • Our approach:ILP + Pseudo-likelihood + Structure priors • For each candidate structure change:Start from current weights & relax convergence • Use subsampling to compute sufficient statistics • Search methods: Beam, shortest-first, etc.
Applications to Date Natural language processing Information extraction Entity resolution Link prediction Collective classification Social network analysis Robot mapping Activity recognition Scene analysis Computational biology Probabilistic Cyc Personal assistants Etc.
Unsupervised Semantic Parsing Goal • Microsoft buys Powerset. • BUYS(MICROSOFT,POWERSET) Challenge Microsoft buysPowerset Microsoft acquiressemantic search engine Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase of Powerset, … Recursively cluster expressions composed of similar subexpressions USP Evaluation Extract knowledge from biomedical abstracts and answer questions Substantially outperforms state of the art Three times as many correct answers; accuracy 88%
Research Directions • Compact representations • Deep architectures • Boolean decision diagrams • Arithmetic circuits • Unified inference procedure • Learning MLNs with many latent variables • Tighter integration of learning and inference • End-to-end NLP system • Complete agent
Resources • Open-source software/Web site: Alchemy • Learning and inference algorithms • Tutorials, manuals, etc. • MLNs, datasets, etc. • Publications • Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009. alchemy.cs.washington.edu