Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Learning Bayesian Network Structure from Massive Datasets:The “Sparse Candidate” Algorithm Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

Abstract • Learning Bayesian network • Optimization problem (in machine learning) • Constraint satisfaction (in statistics) • Search space is extremely large. • Search procedure spends most of times examining extremely unreasonable candidate structures. • If we can reduce search space, faster learning will be possible. • Some restrictions on candidate parent variables for a variable are given. • Bioinformatics

Learning Bayesian Network Structures • Constraint satisfaction problem • 2-test • Optimization problem • BDe, MDL • Learning is to find the structure maximizes these scores. • Search technique • Generally NP-hard • Greedy hill-climbing, simulated annealing • O(n2) • If the number of examples and the number of attributes are large, the computational cost is too expensive to get tractable result.

Combining Statistical Properties • Most of the candidates considered during the search procedure can be eliminated in advance based on our statistical understanding on the domain • If X and Y are almost independent in data, we might decide not to consider Y as a parent of X. • Mutual information • Restricting the possible parents of each variable (k) • k << n – 1 • The key idea is to use the network structure found at the last stage to find better candidate parents.

Background • A Bayesian network for X = {X1, X2, …, Xn} • B = <G, > • The problem of learning a Bayesian network • Given a training set D = {X1, X2, …, XN}, • Find a B that best matches D. • BDe, MDL • Score(G:D) = iScore(Xi|Pa(Xi):NXi, Pa(Xi)) • Greedy hill-climbing search • At each step, all possible local change is examined and the change which brings maximal gain in the score is selected. • Calculation of sufficient statistics is computational bottle-neck.

Simple Intuitions • Using mutual information or correlation • If the true structure is X -> Y -> Z, • I(X;Z) > 0, I(Y;Z) > 0, I(X;Y) > 0 and I(X;Z|Y) = 0 • Basic idea of “Sparse Candidate” algorithm • For each variable X, we find a set of variables Y1, Y2, …, Yk that are most promising candidate parents for X. • This gives us smaller search space. • The main drawback of this idea • A mistake in initial stage can lead us to find an inferior scoring network. • To iterate basic procedure, using the previously constructed network to reconsider the candidate parents.

Outline of the Sparse Candidate Algorithm

Convergence Properties of the Sparse Candidate Algorithm • We require that in Restrict step, the selected candidates for Xi’s parents include Xi’s current parents. • PaGn(Xi)  Cin+1 • This requirement implies that the winning network Bn is a legal structure in the n + 1 iteration. • Score(Bn+1|D)  Score(Bn|D) • Stopping criterion • Score(Bn) = Score(Bn-1)

Mutual Information • Mutual information • Example • I(A;C) > I (A;D) > I(A;B) B A C D

Discrepancy Test • Initial iteration uses mutual information and after this, discrepancy.

Other tests • Conditional mutual information • Penalizing structures with more parameters

Learning with Small Candidate Sets • Standard heuristics • Unconstrained • Space: O(nCk) • Time: O(n2) • Constrained by small candidate • Space: O(2k) • Time: O(kn) • Divide and Conquer heuristics

Strongly Connected Components • Decomposing H into strongly connected components takes linear time.

S H1 H’1 H’2 H2 X Y Separator Decomposition • The bottle-neck is S. • We can order the variables in S to disallow any cycle in H1H2.

Experiments on Synthetic Data

Experiments on Real-Life Data

Conclusions • Sparse candidate set enables us to search for good structure efficiently. • Better criterion is necessary. • The authors applied these techniques to Spellman’s cell-cycle data. • Exploiting of network structure to search in H needs to be improved.

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Presentation Transcript

Chapter 3: Planning Network Protocols and Compatibility

Bayesian models of human learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Comp

Machine Learning: Symbol-based

Bayesian Statistics and Belief Networks

Soil Color

Massive Fluid Resuscitation

Machine Learning on Massive Datasets

Sum-Product Networks: A New Deep Architecture

Social Structure

SPH6004 Advanced Biostatistics

CPSC 3200 Algorithm Analysis and Advanced Data Structure

Prepared especially for the Secondary Professional Learning Network of

Introduction to Predictive Learning

Bayesian Decision Theory (Sections 2.1-2.2)

Sparse LA

Nicholas Zabaras and Xiang Ma

Connectivity Properties for Topology design in Sparse Wireless Multi-hop Networks

Learning More About NokiaCV A Mobile Based Computer Vision Algorithm Suite

Part III Hierarchical Bayesian Models

Network Analysis and Design

Ariix Network