170 likes | 270 Views
Being Bayesian about Network Structure. Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. . Structure Discovery. Current practice: model selection Pick a single model (of high score) Use that model to represent domain structure
E N D
Being Bayesian about Network Structure Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. .
Structure Discovery • Current practice:model selection • Pick a single model (of high score) • Use that model to represent domain structure • Enough data “right” model overwhelmingly likely • But what about the rest of the time? • Many high-scoring models • Answer based on one model often useless • Bayesian model averaging is Bayesian ideal Feature of G, e.g., XY
Model Averaging • Unfortunately, it is intractable: • # of possible structures is superexponential • That’s why no one really does it* • Our contribution: • Closed form solution for fixed ordering over nodes • MCMC over orderings for general case • Faster convergence, robust results. * Exceptions: Madigan & Raftery, Madigan & York; see below
Fixed Ordering Suppose that • We know the ordering of variables • say, X1 > X2 > X3 > X4 > … > Xn parents for Xi must be in X1,…,Xi-1 • Limit number of parents per nodes to k Intuition: • Order decouples choice of parents • The choice of parents for X7 do not restrict the choice of parents for X12 • We can exploit this to simplify the form of P(D) 2k•n•log n networks
Set of possible parent sets for Xi consistent with has size at most k Ordering: Computing P(D) Independence of families Small number of potential families per node Efficient closed-form summation over exponential number of structures
MCMC over Models • Cannot enumerate structures, so sample structures • MCMC Sampling • Define Markov chain over BN models • Run chain to get samples from posterior P(G | D) • Possible pitfalls: • huge number of models • mixing rate (also required burn-in) unknown • islands of high posterior, connected by low bridges
ICU Alarm BN: No Mixing • However, with 500 instances: • the runs clearly do not mix. Score of cuurent sample MCMC Iteration
Effects of Non-Mixing • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates highly variable, nonrobust Initialization true BN vs random true BN vs true BN
Our Approach: Sample Orderings We can write • Comment: Structure prior P(G) changes • uniform prior over structures uniform prior over orderings and on structures consistent with a given ordering Sample orderings and approximate
MCMC Over Orderings Use Metropolis-Hasting algorithm • Specify a proposal distribution q(’| ) • flip:(i1 … ij … ik … in) (i1 … ik … ij … in) • “cut”:(i1 … ij ij+1 … in) (ij+1 … in i1 … ij) Each iteration: • Sample’fromq(’| ) • go ’with probability • Since priors are uniform Efficient computation!!!
Why Ordering Helps • Smaller space • Significant reduction in size of sample space • Better structuredspace • We can get from one ordering to another in (relatively) small number of steps • Smoother posterior “landscape” • Score of an ordering is sum over many networks • No ordering is “horrendous” no “islands” of high posterior separated by a deep blue sea
Mixing with MCMC-Orderings • 4 runs on ICU-Alarm with 500 instances • fewer iterations than MCMC-Nets • approximately same amount of computation • Process is clearly mixing! Score of cuurent sample MCMC Iteration
Mixing of MCMC runs • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates very robust 1000 instances 100 instances
Computing Feature Posterior: P(f|’,D) Edges: Markov Blanket: • IfYZ or both Y and Z are parents of some X • Posterior of these features are independent Other features (e.g., existence of causal path): • Sample networks from ordering • Estimate features from networks
Structure Bootstrap Order 50 40 30 20 10 0 10 20 30 0 0 10 20 30 0 10 20 30 Feature Reconstruction (ICU-Alarm) Markov Features Reconstruct “true” features of generating network False Negatives False Positives
Structure Bootstrap Order 0 200 400 600 Feature Reconstruction (ICU-Alarm) Path Features 200 150 100 50 0 200 150 100 50 0 200 150 100 50 0
Conclusion • Full Bayesian model averaging is tractable for known ordering. • MCMC over orderings allows robust approximation to full Bayesian averaging over Bayes nets • rapid and reliable mixing • robust & reliable estimates for probability of structural features • Crucial for structure discovery in domains with limited data • Biological discovery