1 / 17

Being Bayesian about Network Structure

Being Bayesian about Network Structure. Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. . Structure Discovery. Current practice: model selection Pick a single model (of high score) Use that model to represent domain structure

elsa
Download Presentation

Being Bayesian about Network Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Being Bayesian about Network Structure Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. .

  2. Structure Discovery • Current practice:model selection • Pick a single model (of high score) • Use that model to represent domain structure • Enough data  “right” model overwhelmingly likely • But what about the rest of the time? • Many high-scoring models • Answer based on one model often useless • Bayesian model averaging is Bayesian ideal Feature of G, e.g., XY

  3. Model Averaging • Unfortunately, it is intractable: • # of possible structures is superexponential • That’s why no one really does it* • Our contribution: • Closed form solution for fixed ordering over nodes • MCMC over orderings for general case • Faster convergence, robust results. * Exceptions: Madigan & Raftery, Madigan & York; see below

  4. Fixed Ordering Suppose that • We know the ordering of variables • say, X1 > X2 > X3 > X4 > … > Xn parents for Xi must be in X1,…,Xi-1 • Limit number of parents per nodes to k Intuition: • Order decouples choice of parents • The choice of parents for X7 do not restrict the choice of parents for X12 • We can exploit this to simplify the form of P(D) 2k•n•log n networks

  5. Set of possible parent sets for Xi consistent with  has size at most k Ordering: Computing P(D) Independence of families Small number of potential families per node Efficient closed-form summation over exponential number of structures

  6. MCMC over Models • Cannot enumerate structures, so sample structures • MCMC Sampling • Define Markov chain over BN models • Run chain to get samples from posterior P(G | D) • Possible pitfalls: • huge number of models • mixing rate (also required burn-in) unknown • islands of high posterior, connected by low bridges

  7. ICU Alarm BN: No Mixing • However, with 500 instances: • the runs clearly do not mix. Score of cuurent sample MCMC Iteration

  8. Effects of Non-Mixing • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates highly variable, nonrobust Initialization true BN vs random true BN vs true BN

  9. Our Approach: Sample Orderings We can write • Comment: Structure prior P(G) changes • uniform prior over structures uniform prior over orderings and on structures consistent with a given ordering Sample orderings and approximate

  10. MCMC Over Orderings Use Metropolis-Hasting algorithm • Specify a proposal distribution q(’| ) • flip:(i1 … ij … ik … in)  (i1 … ik … ij … in) • “cut”:(i1 … ij ij+1 … in)  (ij+1 … in i1 … ij) Each iteration: • Sample’fromq(’| ) • go  ’with probability • Since priors are uniform Efficient computation!!!

  11. Why Ordering Helps • Smaller space • Significant reduction in size of sample space • Better structuredspace • We can get from one ordering to another in (relatively) small number of steps • Smoother posterior “landscape” • Score of an ordering is sum over many networks • No ordering is “horrendous” no “islands” of high posterior separated by a deep blue sea

  12. Mixing with MCMC-Orderings • 4 runs on ICU-Alarm with 500 instances • fewer iterations than MCMC-Nets • approximately same amount of computation • Process is clearly mixing! Score of cuurent sample MCMC Iteration

  13. Mixing of MCMC runs • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates very robust 1000 instances 100 instances

  14. Computing Feature Posterior: P(f|’,D) Edges: Markov Blanket: • IfYZ or both Y and Z are parents of some X • Posterior of these features are independent Other features (e.g., existence of causal path): • Sample networks from ordering • Estimate features from networks

  15. Structure Bootstrap Order 50 40 30 20 10 0 10 20 30 0 0 10 20 30 0 10 20 30 Feature Reconstruction (ICU-Alarm) Markov Features Reconstruct “true” features of generating network False Negatives False Positives

  16. Structure Bootstrap Order 0 200 400 600 Feature Reconstruction (ICU-Alarm) Path Features 200 150 100 50 0 200 150 100 50 0 200 150 100 50 0

  17. Conclusion • Full Bayesian model averaging is tractable for known ordering. • MCMC over orderings allows robust approximation to full Bayesian averaging over Bayes nets • rapid and reliable mixing • robust & reliable estimates for probability of structural features • Crucial for structure discovery in domains with limited data • Biological discovery

More Related