1 / 21

Being Bayesian About Network Structure

Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller. Roadmap. Bayesian learning of Bayesian Networks Exact vs Approximate Learning Markov Chain Monte Carlo method MCMC over structures MCMC over orderings

klucy
Download Presentation

Being Bayesian About Network Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller CS673

  2. Roadmap • Bayesian learning of Bayesian Networks • Exact vs Approximate Learning • Markov Chain Monte Carlo method • MCMC over structures • MCMC over orderings • Experimental Results • Conclusions CS673

  3. E B R A C Bayesian Networks • Compact representation of probability distributions via conditional independence Qualitative part: Directed acyclic graph-DAG • Nodes – random variables • Edges – direct influence Quantitative part: Set of conditional probability distribution Together: Define a unique distribution in a factored form P(B,E,A,C,R) =P(B)P(E)P(A|B,E)P(R|E)P(C|A) CS673

  4. Why Learn Bayesian Networks? • Conditional independencies & graphical representation capture the structure of many real-world distributions - Provides insights into domain • Graph structure allows “knowledge discovery” • Is there a direct connection between X & Y • Does X separate between two “subsystems” • Does X causally affect Y • Bayesian Networks can be used for many tasks • Inference, causality, etc. • Examples: scientific data mining - Disease properties and symptoms - Interactions between the expression of genes CS673

  5. E B R A C Learning Bayesian Networks Inducer Data + Prior Information • Inducer needs the prior probability distribution P(B)Using Bayesian conditioning, update the prior P(B) P(B|D) CS673

  6. E E A A B B S S Why Struggle for Accurate Structure? “True” structure Adding an arc Missing an arc E A B S • Increases the number of parameters to be fittedWrong assumptions about causality and domain structure • Cannot be compensated by accurate fitting of parametersAlso misses causality and domain structure CS673

  7. Score-based learning • Define scoring function that evaluates how well a structure matches the data E E B E A A A B B • Search for a structure that maximizes the score CS673

  8. Marginal Likelihood Prior over parameters Likelihood Bayesian Score of a Model where CS673

  9. P(G|D) E B R A C Discovering Structure – Model Selection • Current practice: model selection Pick a single high-scoring model Use that model to infer domain structure CS673

  10. P(G|D) E B E B E B E B E B R R R R A A A A R A C C C C C Discovering Structure – Model Averaging • ProblemSmall sample size many high scoring modelsAnswer based on one model often uselessWant features common to many models CS673

  11. Bayesian score for G Feature of G, e.g., X  Y Indicator function for feature f Bayesian Approach • Estimate probability of features • Edge X  Y • Markov edge X -- Y • Path X … Y • ... • Huge (super-exponential – 2Θ(n2)) number of networks G • Exact learning - intractable CS673

  12. Approximate Bayesian Learning • Restrict the search space to Gk, where Gk – set of graphs with indegree bounded by k -space still super-exponential • Find a set G of high scoring structures • Estimate - Hill-climbing – biased sample of structures CS673

  13. Markov Chain Monte Carlo over Networks MCMC Sampling • Define Markov Chain over BNs • Perform a walk through the chain to get samples G’s whose posteriors converge to the posterior P(G|D) of the true structure • Possible pitfalls: • Still super-exponential number of networks • Time for chain to converge to posterior is unknown • Islands of high posterior, connected by low bridges CS673

  14. Better Approach to Approximate Learning • Further constraints on the search space • Perform model averaging over the structures consistent with some know (fixed) total ordering ‹ • Ordering of variables: • X1 ‹ X2 ‹…‹ Xn parents for Xi must be in X1, X2,…, Xi-1 • Intuition: Order decouples choice of parents • Choice of Pa(X7) does not restrict choice of Pa(X12) • Can compute efficiently in closed formLikelihood P(D|‹)Feature probability P(f|D,‹) CS673

  15. Sample Orderings We can write Sample orderings and approximate MCMC Sampling • Define Markov Chain over orderings • Run chain to get samples from posterior P(<|D) CS673

  16. Experiments: Exact posterior over orders versus order-MCMC CS673

  17. Experiments: Convergence CS673

  18. Experiments: structure-MCMC – posterior correlation for two different runs CS673

  19. Experiments: order-MCMC – posterior correlation for two different runs CS673

  20. Conclusion • Order-MCMC better than structure-MCMC CS673

  21. References Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks, N. Friedman and D. Koller. Machine Learning Journal, 2002 NIPS 2001 Tutorial on learning Bayesian networks from Data. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI-98 Tutorial on learning Bayesian networks from Data. D. Heckerman.  A Tutorial on Learning with Bayesian Networks.  In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999.  Also appears as Technical Report MSR-TR-95-06, Microsoft Research, March, 1995.  An earlier version appears as Bayesian Networks for Data Mining, Data Mining and Knowledge Discovery, 1:79-119, 1997. Christophe Andrieu, Nando de Freitas, Arnaud Doucet and Michael I. Jordan. An Introduction to MCMC for Machine Learning. Machine Learning, 2002. Artificial Intelligence: A Modern Approach. Stuart Russell and Peter Norvig CS673

More Related