1 / 20

Mind Change Optimal Learning Bayes Nets Structure

Mind Change Optimal Learning Bayes Nets Structure. Oliver Schulte Simon Fraser University Vancouver, Canada oschulte@cs.sfu.ca `. with Wei Luo (SFU, wluoa@cs.sfu.ca) and Russ Greiner (U of Alberta, greiner@cs.ualberta.ca). Outline.

mihaly
Download Presentation

Mind Change Optimal Learning Bayes Nets Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mind Change Optimal Learning Bayes Nets Structure Oliver Schulte Simon Fraser UniversityVancouver, Canada oschulte@cs.sfu.ca ` with Wei Luo (SFU, wluoa@cs.sfu.ca) andRuss Greiner (U of Alberta, greiner@cs.ualberta.ca)

  2. Outline • Language Learning Model for Bayes Net (BN) Structure Learning. • Mind Change Complexity of BN Learning. • Mind Change, Convergence Time Optimality. • NP-hardness of Optimal Learner. Learning Bayes Nets Based on Conditional Dependencies

  3. Bayes Nets: Overview • Very widely used graphical formalism for probabilistic reasoning and KR in AI and machine learning. • Bayes Net Structure = Directed Acyclic Graph. • Nodes = Variables of Interest. • Arcs = direct “influence”, “association”. • Structure represents conditional dependencies. Learning Bayes Nets Based on Conditional Dependencies

  4. Example Season Sprinkler Wet Rain Season dependson Slippery Sprinkler depends on Rain. Sprinkler does not depend on Wet given Season. Sprinkler depends on Wet given Season, Rain. Slippery Learning Bayes Nets Based on Conditional Dependencies

  5. A A A Graphs entail Dependencies C Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B) B C Dep(A,B),Dep(A,B|C) B C B Learning Bayes Nets Based on Conditional Dependencies

  6. I-maps and Probability Distributions • Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G. • Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G. • G is an I-map of P  G entails all conditional dependencies in P. • Theorem Fix G,P. There is a parameter setting  for G such that (G, ) represents P  G is an I-map of P. Learning Bayes Nets Based on Conditional Dependencies

  7. Two Approaches to Learning Bayes Net Structure Aim: find G that represents P with suitable parameters • selectgraph G as “model” with parameters to be estimated • “search and score” • find G that represents (in)dependencies in P • test for dependencies, cover Learning Bayes Nets Based on Conditional Dependencies

  8. Our Hybrid Approach Set of(In)Dependencies FinalOutput Graph Sample The final selected graph maximizesa model selection score and covers all observed (in)dependencies. Learning Bayes Nets Based on Conditional Dependencies

  9. C B A Case 1 Case 2 Case 3 Definition of Hybrid Criterion • Let d be a sample. Let S(G,d) be a score function. S 10.5 • Let Dep be a set of conditional dependencies extracted from sample d. Graph G optimizes score S given Dep, sample d  • G entails the dependencies Dep, and • if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).

  10. Local Search Heuristics for Constrained Search • There is a general method for adapting any local search heuristic to accommodate observed dependencies. • Will present adaptation of GES search - call it IGES. Learning Bayes Nets Based on Conditional Dependencies

  11. A A A A A C C C C C B B B B B GES Search (Meek, Chickering) Score = 8.5 ShrinkPhase:DeleteEdges GrowPhase:AddEdges Score = 8 Score = 9 Score = 7 Score = 5 Learning Bayes Nets Based on Conditional Dependencies

  12. Testing Procedure Case 1 Case 2 Case 3 A A C C Score = 5 B B Score = 7 IGES Search Step 1: Extract Dependencies From Sample Dependencies Continue with Growth Phase until all dependencies are covered. During Shrink Phase, delete edge only if dependencies are still covered. given Dep(A,B) Learning Bayes Nets Based on Conditional Dependencies

  13. Asymptotic Equivalence GES = IGES TheoremAssume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P.Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit. • So IGES inherits the convergence propertiesof GES.

  14. Extracting Dependencies • We use 2 test (with cell coverage condition) • Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k chosen by user • More sophisticated testing strategy coming soon. Learning Bayes Nets Based on Conditional Dependencies

  15. Simulation Setup: Methods • The hybrid approach is a general schema. • Our Setup • Statistical Test: 2, sign. 5% • Score S: Bdeu (with Tetrad default settings) • Search Method: GES, adapted Learning Bayes Nets Based on Conditional Dependencies

  16. Simulation Setup: Graphs and Data • Random DAGs with binary variables. • #Nodes: 4,6,8,10. • Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600. • 10 random samples per graph per sample size, average results. • Graphs generated with Tetrad’s random DAG utility.

  17. Show Some Graphs

  18. Conclusion for I-map learning: The Underfitting Zone Diver-gence from True Graph standard search + score • Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well. • But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs. constrained S + S samplesize medium:underfitting of correlations large:convergence zone small:little significance

  19. Future Work: More Efficient Testing Strategy • Say that a graph G satisfies the Markov condition wrt sample dfor all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X). • Given sample d, look for graph G that maximizes score and satisfies the MC wrt d. • Requires only (#Var)2 tests. Learning Bayes Nets Based on Conditional Dependencies

  20. Summary: Hybrid Criterion - test, search and score. • Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. • Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations. • Theory + Simulation evidence suggests that this: • speeds up convergence to correct graph • addresses underfitting on small-medium samples. THE END Learning Bayes Nets Based on Conditional Dependencies

More Related