120 likes | 276 Views
Probabilistic Inference in PRISM. Taisuke Sato Tokyo Institute of Technology. Problem. Statistical machine learning is a labor-intensive process: { modeling learning evaluation}* of trial-and-error
Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology
Problem • Statistical machine learning is a labor-intensive process: • {modeling learning evaluation}* of trial-and-error • Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference model-specific learning algorithms ... Model 1 Model 2 Model n EM2 EM1 EMn EM VB MCMC ...
Our solution • Develop a high-level modeling language that offers universal learning and inference methods applicable to every model ... Model 1 Model 2 Model n modeling language EM VB MCMC ... • The user concentrates on modeling and the rest (learning and inference) is taken care of by the system
PRISM(http://sato-www.cs.titech.ac.jp/prism/) Probabilistic models • Logic-based high-level modeling language Bayesian network New model HMM PCFG ... PRISM system EM/MAP VT VBVT VB MCMC Learning methods • Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks
Basic ideas • Semantics • program = Turing machine + probabilistic choice • + Dirichlet prior • denotation = a probability measure over possible worlds • Propositionalized probability computation (PPC) • programs written at predicate logic level • probability computation at propositional logic level • Dynamic programming for PPC • proof search generates a directed graph (explanationgraph) • Probabilities are computed from bottom to top in the graph • Discriminative use • generatively define a model by a PRISM program and descriminatively use it for better prediction performance
b o ABO blood type program msw(abo,a) is true with prob. 0.5 values(abo,[a,b,o],[0.5,0.2,0.3]). btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). father mother a b a o AB A child B probabilistic primitivessimulate gene inheritance from father (left) and mother (right)
Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(abo,a) & msw(abo,a) gtype(a,o) <=> msw(abo,a) & msw(abo,o) gtype(o,a) <=> msw(abo,o) & msw(abo,a) 0.55 Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to mswatoms 0.15 0.15 0.25 0.25 0.5 0.5 0.15 0.5 0.3 PPC+DP subsumes forward-backward, belief propagation, inside-outside computation Expl. graph is acyclic and dynamic programming (DP) is possible 0.15 0.3 0.5
Learning • A program defines a joint distributionP(x,y|q) where x hidden and y observed • P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1 • Learning q from observed data y by maximizing • P(y|q) MLE/MAP • P(x*,y|q) where x* = argmax_x P(x,y|q) VT • From a Bayesian point of view, a program defines marginal likelihood ∫P(x,y|q,a) dq • We wish to compute • predictive distribution = ∫P(x|y,q,a) dq • marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq • Both need approximation • Variational Bayes (VB) VB, VB-VT • MCMC Metropolis-Hastings
Sample session 1- Expl. graph and prob. computation built-in predicate | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(gene,a) & msw(gene,a) gtype(a,o) <=> msw(gene,a) & msw(gene,o) gtype(o,a) <=> msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55
Sample session 2 - MLE and Viterbi inference | ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine ... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)
Sample session 3- Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)
Summary • PRISM = Probabilistic Prolog for statistical machine learning • Forward sampling • Exact probability computation • Parameter learning • MLE/MAP, • VT • Bayesian inference • VB • VBVT • MCMC • Viterbi inference • model core (BIC,Cheesman-Stutz,VFE) • smoothing • Current version 2.1