1 / 17

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm. Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang. Abstract. Learning Bayesian network Optimization problem (in machine learning) Constraint satisfaction (in statistics)

Download Presentation

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Bayesian Network Structure from Massive Datasets:The “Sparse Candidate” Algorithm Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

  2. Abstract • Learning Bayesian network • Optimization problem (in machine learning) • Constraint satisfaction (in statistics) • Search space is extremely large. • Search procedure spends most of times examining extremely unreasonable candidate structures. • If we can reduce search space, faster learning will be possible. • Some restrictions on candidate parent variables for a variable are given. • Bioinformatics

  3. Learning Bayesian Network Structures • Constraint satisfaction problem • 2-test • Optimization problem • BDe, MDL • Learning is to find the structure maximizes these scores. • Search technique • Generally NP-hard • Greedy hill-climbing, simulated annealing • O(n2) • If the number of examples and the number of attributes are large, the computational cost is too expensive to get tractable result.

  4. Combining Statistical Properties • Most of the candidates considered during the search procedure can be eliminated in advance based on our statistical understanding on the domain • If X and Y are almost independent in data, we might decide not to consider Y as a parent of X. • Mutual information • Restricting the possible parents of each variable (k) • k << n – 1 • The key idea is to use the network structure found at the last stage to find better candidate parents.

  5. Background • A Bayesian network for X = {X1, X2, …, Xn} • B = <G, > • The problem of learning a Bayesian network • Given a training set D = {X1, X2, …, XN}, • Find a B that best matches D. • BDe, MDL • Score(G:D) = iScore(Xi|Pa(Xi):NXi, Pa(Xi)) • Greedy hill-climbing search • At each step, all possible local change is examined and the change which brings maximal gain in the score is selected. • Calculation of sufficient statistics is computational bottle-neck.

  6. Simple Intuitions • Using mutual information or correlation • If the true structure is X -> Y -> Z, • I(X;Z) > 0, I(Y;Z) > 0, I(X;Y) > 0 and I(X;Z|Y) = 0 • Basic idea of “Sparse Candidate” algorithm • For each variable X, we find a set of variables Y1, Y2, …, Yk that are most promising candidate parents for X. • This gives us smaller search space. • The main drawback of this idea • A mistake in initial stage can lead us to find an inferior scoring network. • To iterate basic procedure, using the previously constructed network to reconsider the candidate parents.

  7. Outline of the Sparse Candidate Algorithm

  8. Convergence Properties of the Sparse Candidate Algorithm • We require that in Restrict step, the selected candidates for Xi’s parents include Xi’s current parents. • PaGn(Xi)  Cin+1 • This requirement implies that the winning network Bn is a legal structure in the n + 1 iteration. • Score(Bn+1|D)  Score(Bn|D) • Stopping criterion • Score(Bn) = Score(Bn-1)

  9. Mutual Information • Mutual information • Example • I(A;C) > I (A;D) > I(A;B) B A C D

  10. Discrepancy Test • Initial iteration uses mutual information and after this, discrepancy.

  11. Other tests • Conditional mutual information • Penalizing structures with more parameters

  12. Learning with Small Candidate Sets • Standard heuristics • Unconstrained • Space: O(nCk) • Time: O(n2) • Constrained by small candidate • Space: O(2k) • Time: O(kn) • Divide and Conquer heuristics

  13. Strongly Connected Components • Decomposing H into strongly connected components takes linear time.

  14. S H1 H’1 H’2 H2 X Y Separator Decomposition • The bottle-neck is S. • We can order the variables in S to disallow any cycle in H1H2.

  15. Experiments on Synthetic Data

  16. Experiments on Real-Life Data

  17. Conclusions • Sparse candidate set enables us to search for good structure efficiently. • Better criterion is necessary. • The authors applied these techniques to Spellman’s cell-cycle data. • Exploiting of network structure to search in H needs to be improved.

More Related