Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy andSchool of Computing Science Simon Fraser UniversityVancouver, Canada oschulte@sfu.ca ` with Wei Luo (Simon Fraser) andRuss Greiner (U of Alberta)

Outline • Brief Intro to Bayes Nets • Combining Dependency Information with Model Selection • Learning from Dependency Data Only: Learning-Theoretic Analysis Learning Bayes Nets Based on Conditional Dependencies

Bayes Nets: Overview • Bayes Net Structure = Directed Acyclic Graph. • Nodes = Variables of Interest. • Arcs = direct “influence”, “association”. • Parameters = CP Tables = Prob of Child given Parents. • Structure represents (in)dependencies. • Structure + parameters represents joint probability distribution over variables. Learning Bayes Nets Based on Conditional Dependencies

Examples from CIspace (UBC) Learning Bayes Nets Based on Conditional Dependencies

A A A Graphs entail Dependencies C Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B) B C Dep(A,B),Dep(A,B|C) B C B Learning Bayes Nets Based on Conditional Dependencies

I-maps and Probability Distributions • Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G. • Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G. • Informally, G is an I-map of P  G entails all conditional dependencies in P. • Theorem Fix G,P. There is a parameter setting  for G such that (G, ) represents P  G is an I-map of P. Learning Bayes Nets Based on Conditional Dependencies

Two Approaches to Learning Bayes Net Structure Aim: find G that represents P with suitable parameters • selectgraph G as “model” with parameters to be estimated • “search and score” • find G that represents dependencies in P • “test and cover” dependencies

Our Hybrid Approach Set ofDependencies FinalOutput Graph Sample The final selected graph maximizesa model selection score and covers all observed dependencies. Learning Bayes Nets Based on Conditional Dependencies

C B A Case 1 Case 2 Case 3 Definition of Hybrid Criterion • Let d be a sample. Let S(G,d) be a score function. S 10.5 • Let Dep be a set of conditional dependencies extracted from sample d. Graph G optimizes score S given Dep, sample d  • G entails the dependencies Dep, and • if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).

Local Search Heuristics for Constrained Search • There is a general method for adapting any local search heuristic to accommodate observed dependencies. • Will present adaptation of GES search - call it IGES. Learning Bayes Nets Based on Conditional Dependencies

Score = 8.5 ShrinkPhase:DeleteEdges Score = 8 Score = 9 Score = 7 A A A A A C C C C C B B B B B Score = 5 GES Search (Meek, Chickering) GrowthPhase:AddEdges Learning Bayes Nets Based on Conditional Dependencies

Step 1: Extract Dependencies From Sample Testing Procedure Dependencies Case 1 Case 2 Case 3 A A C C Score = 5 given Dep(A,B) B B Score = 7 IGES Search Continue with Growth Phase until all dependencies are covered. During Shrink Phase, delete edge only if dependencies are still covered. Learning Bayes Nets Based on Conditional Dependencies

Asymptotic Equivalence GES = IGES TheoremAssume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P.Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit. • So IGES inherits the convergence propertiesof GES.

Extracting Dependencies • We use 2 test (with cell coverage condition) • Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k chosen by user • More sophisticated testing strategy coming soon. Learning Bayes Nets Based on Conditional Dependencies

Simulation Setup: Methods • The hybrid approach is a general schema. • Our Setup • Statistical Test: 2 • Score S: BDeu (with Tetrad default settings) • Search Method: GES, adapted Learning Bayes Nets Based on Conditional Dependencies

Simulation Setup: Graphs and Data • Random DAGs with binary variables. • #Nodes: 4,6,8,10. • Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, 25600. • 10 random samples per graph per sample size, average results. • Graphs generated with Tetrad’s random DAG utility.

Result Graphs

Conclusion for I-map learning: The Underfitting Zone Diver-gence from True Graph standard search + score • Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well. • But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs. constrained S + S samplesize medium:underfitting of correlations small:little significance large:convergence zone

A A C C B B Part II: Learning-Theoretic Model (COLT 2007) • Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements. • Data repetition is possible. • Learner outputs graph (pattern); may output ?. … Data Dep(A,B) Dep(B,C) Dep(A,C|B) … Conjectures ? Learning Bayes Nets Based on Conditional Dependencies

Criteria for Optimal Learning • Convergence: Learner must eventually settle on true graph. • Learner must minimize mind changes. • Given 1 and 2, learner is not dominated in convergence time. Learning Bayes Nets Based on Conditional Dependencies

The Optimal Learning Procedure Theorem There is a unique optimal learner defined as follows: • If there is a unique graph G covering the observed dependencies with a minimum number of adjacencies, output G. • Otherwise output ?. Learning Bayes Nets Based on Conditional Dependencies

Computational Complexity of the Unique Optimal Learner Theorem The following problem is NP-hard: Decide if there is a unique edge-minimal map for a set of dependencies D. If yes, output the graph. Proof: Reduction to Unique Exact 3Set Cover. x1 x2 x3 x4 x5 x6 x7 x8 x9 {x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9} {x1,x2,x3},{x4,x5,x7},{x3,x6,x9} Learning Bayes Nets Based on Conditional Dependencies

Hybrid Method and Optimal Learner Score-based methods tend to underfit (with discrete variables): place edges correctly but too few mind change optimal but not convergence time optimal. • Hybrid method speeds up convergence. Learning Bayes Nets Based on Conditional Dependencies

A New Testing Strategy • Say that a graph G satisfies the Markov condition wrt sample dfor all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)). • Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies. Learning Bayes Nets Based on Conditional Dependencies

Future Work • Use Markov condition to develop local search algorithm for score optimization requiring only (#Var)2 tests. • Apply idea of Markov condition +edge minimization for continuous variable models. Learning Bayes Nets Based on Conditional Dependencies

Summary: Hybrid Criterion - test, search and score. • Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. • Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations. • Theory + Simulation evidence suggests that this: • speeds up convergence to correct graph • addresses underfitting on small-medium samples. Learning Bayes Nets Based on Conditional Dependencies

Summary: Learning-Theoretic Analysis • Learning Model: Learn graph from dependencies alone. • Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies. • Implementing this method is NP-hard. Learning Bayes Nets Based on Conditional Dependencies

References “Mind Change Optimal Learning of Bayes Net Structure”.O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT). THE END Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

Learning Bayes Nets Based on Conditional Dependencies

Presentation Transcript

Learning in Bayes Nets

Identifying Conditional Independencies in Bayes Nets

Bayes Nets

Learning Structure in Bayes Nets (Typically also learn CPTs here)

Bayes Nets

Conditional Dependencies

Discussion of Conditional Functional Dependencies

Bayes Nets

Conditional learning

Bayes Nets and Probabilities

Modelling Relational Statistics With Bayes Nets

Exact Inference in Bayes Nets

Dependencies on NETS

Bayes Nets

Conditional Probability and Bayes’ Theorem

The IMAP Hybrid Method for Learning Gaussian Bayes Nets

Artificial Intelligence Chapter 20 Learning and Acting with Bayes Nets

Mind Change Optimal Learning Bayes Nets Structure

An Overview of Learning Bayes Nets From Data

Conditional Probability and Bayes’ Theorem

Bayes Nets

Conditional Dependencies