1 / 57

Finding Optimal Bayesian Networks with Greedy Search

Finding Optimal Bayesian Networks with Greedy Search. Max Chickering. Reasoning Under Uncertainty. Print Troubleshooter (Win95, Win2k, WinXP). Network Up. Net/Local Printing. Correct Local Port. Correct Printer Path. Net Path OK. PC to Printer Transport OK. Local Path OK.

dyanne
Download Presentation

Finding Optimal Bayesian Networks with Greedy Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Optimal Bayesian Networks with Greedy Search Max Chickering

  2. Reasoning Under Uncertainty Print Troubleshooter (Win95, Win2k, WinXP) Network Up Net/Local Printing Correct Local Port Correct Printer Path Net Path OK PC to Printer Transport OK Local Path OK Local Cable Connected Net Cable Connected Paper Loaded Printer Data OK Printer On and Online Printer Memory Adequate Print Output OK

  3. Troubleshooters(Win95 on)

  4. Answer Wizard(Office 95 on)

  5. Machine Learning and Applied Statistics Group Data • Applications • Commerce Server • SQL Server • Spam Detection • Machine Translation • Analysis of Web Data

  6. Outline • Bayesian-Networks • Learning • Greedy Equivalence Search (GES) • Optimality of GES • (Details of Meek’s Conjecture)

  7. Bayesian Networks Use B = (S,q) to represent p(X1, …, Xn) Structural Component S is a DAG Parameters q specify local probability distributions

  8. Markov Conditions From factorization: I(X, ND | Par(X)) ND Par Par Par X Desc ND Desc Markov Conditions + Graphoid Axioms characterize all independencies

  9. Structure/Distribution Inclusion p is included in S if there exists q s.t. B(S,q) defines p All distributions p X Y Z S

  10. Structure/Structure Inclusion T ≤ S T is included in S if every p included in T is included in S All distributions X Y Z X Y Z S T (S is an I-map of T)

  11. Structure/Structure EquivalenceT  S All distributions X Y Z X Y Z S T Reflexive, Symmetric, Transitive

  12. Equivalence A B C A B C D D Skeleton V-structure Theorem (Verma and Pearl, 1990) ST same v-structures and skeletons

  13. Learn the structure Estimate the conditional distributions Learning Bayesian Networks X X Y Z 0 1 1 1 0 1 0 1 0 . . . 1 0 1 iid samples Y p* Z Generative Distribution Observed Data Learned Model

  14. Learning Structure • Scoring criterion F(D, S) • Search procedure Identify one or more structures with high values for the scoring function

  15. Bayesian Criterion Sh : generative distribution p* has same independence constraints as S. FBayes(S,D) = log p(Sh |D) = k + log p(D|Sh) + log p(Sh) Structure Prior (e.g. prefer simple) Marginal Likelihood (closed form w/ assumptions)

  16. Consistent Scoring Criterion Criterion favors (in the limit) simplest model that includes the generative distribution p* S includes p*, T does not include p*  F(S,D) > F(T,D) Both include p*, S has fewer parameters  F(S,D) > F(T,D) X Y Z p* X Y Z X Y Z X Y Z

  17. Bayesian Criterion is Consistent • Assume Conditionals: • unconstrained multinomials • linear regressions Geiger, Heckerman, King and Meek (2001) Network structures = curved exponential models Haughton (1988) Bayesian Criterion is consistent

  18. Locally Consistent Criterion S and T differ by one edge: X Y X Y S T If I(X,Y|Par(X)) in p*then F(S,D) > F(T,D) Otherwise F(S,D) < F(T,D)

  19. Bayesian Criterion is Locally Consistent • Bayesian score approaches BIC + constant • BIC is decomposible: • Difference in score same for any DAGS that differ by YX edge if X has same parents X Y X Y Complete network (always includes p*)

  20. Bayesian Criterion isScore Equivalent ST F(S,D) = F(T,D) Y X Sh: no independence constraints S Y X Th: no independence constraints T Sh = Th

  21. Search Procedure • Set of states • Representation for the states • Operators to move between states • Systematic Search Algorithm

  22. Greedy Equivalence Search • Set of states Equivalence classes of DAGs • Representation for the states Essential graphs • Operators to move between states Forward and Backward Operators • Systematic Search Algorithm Two-phase Greedy

  23. Representation: Essential Graphs A B C Compelled Edges Reversible Edges D E F A B C D E F

  24. GES Operators Forward Direction – single edge additions Backward Direction – single edge deletions

  25. Two-Phase Greedy Algorithm • Phase 1: Forward Equivalence Search (FES) • Start with all-independence model • Run Greedy using forward operators • Phase 2: Backward Equivalence Search (BES) • Start with local max from FES • Run Greedy using backward operators

  26. Forward Operators • Consider all DAGs in the current state • For each DAG, consider all single-edge additions (acyclic) • Take the union of the resulting equivalence classes

  27. A B A B A B C C C A A B B A B A A B B A B C C C C C C A B A A B B A B A A B B C C C C C C Forward-Operators Example Current State: All DAGs: All DAGs resulting from single-edge addition: Union of corresponding essential graphs:

  28. A B C A B A B C C A B A B C C Forward-Operators Example

  29. Backward Operators • Consider all DAGs in the current state • For each DAG, consider all single-edge deletions • Take the union of the resulting equivalence classes

  30. A B A A B B C C C A B C A B A B C C Backward-Operators Example Current State: All DAGs: All DAGs resulting from single-edge deletion: A B A B A B A B A B A B C C C C C C Union of corresponding essential graphs:

  31. A B C A B A B C C Backward-Operators Example

  32. DAG Perfect DAG-perfect distribution p Exists DAG G: I(X,Y|Z) in p I(X,Y|Z) in G Non-DAG-perfect distribution q A B A B A B C D C D C D I(A,D|B,C) I(B,C|A,D) I(B,C|A,D) I(A,D|B,C)

  33. Optimality of GES If p* is DAG-perfect wrt some G* X X X X Y Z 0 1 1 1 0 1 0 1 0 . . . 1 0 1 Y Y Y n iid samples GES Z Z Z G* S* S p* For large n, S = S*

  34. Optimality of GES BES FES State includes S* State equals S* All-independence • Proof Outline • After first phase (FES), current state includes S* • After second phase (BES), the current state = S*

  35. FES Maximum Includes S* Assume: Local Max does NOT include S* Any DAG G from S Markov Conditions characterize independencies: In p*, exists X not indep. non-desc given parents A B C  I(X,{A,B,C,D} | E) in p* D E X p* is DAG-perfect  composition axiom holds A B C  I(X,C | E) in p* D E X Locally consistent: adding CX edge improves score, and EQ class is a neighbor

  36. BES Identifies S* • Current state always includes S*: Local consistency of the criterion • Local Minimum is S*: Meek’s conjecture

  37. Meek’s Conjecture Any pair of DAGs G,H such that H includes G (G≤H) There exists a sequence of • covered edge reversals in G (2) single-edge additions to G after each change G≤H after all changes G=H

  38. Meek’s Conjecture A B I(A,B) I(C,B|A,D) C D H A B A B A B A B C D C D C D C D G

  39. Meek’s Conjecture and BESS*≤S Assume: Local Max S Not S* Any DAG H from S Any DAG G from S* Add Rev Rev Add Rev G H S* Neighbor of S in BES S

  40. Discussion Points • In practice, GES is as fast as DAG-based search Neighborhood of essential graphs can be generated and scored very efficiently • When DAG-perfect assumption fails, we still get optimality guarantees As long as composition holds in generative distribution, local maximum is inclusion-minimal

  41. Thanks! My Home Page: http://research.microsoft.com/~dmax Relevant Papers: “Optimal Structure Identification with Greedy Search” JMLR Submission Contains detailed proofs of Meek’s conjecture and optimality of GES “Finding Optimal Bayesian Networks” UAI02 Paper with Chris Meek Contains extension of optimality results of GES when not DAG perfect

  42. Active Paths • Z-active Path between X and Y: (non-standard) • Neither X nor Y is in Z • Every pair of colliding edges meets at a member of Z • No other pair of edges meets at a member of Z X Z Y G ≤ H If Z-active path between X and Y in G then Z-active path between X and Y in H

  43. A B C Active Paths X A Z W B Y • X-Y: Out-ofX and In-toY • X-W Out-of both X and W • Any sub-path between A,BZ is also active • A – B, B–C, at least one is out-ofB • Active path between A and C

  44. Simple Active Paths contains YX B A Then  active path (1) Edge appears exactly once OR A Y X B (2) Edge appears exactly twice A Y X X Y B Simplify discussion: Assume (1) only – proofs for (2) almost identical

  45. Typical Argument:Combining Active Paths A X Y B X Y Z sink node adj X,Y Z G Z H A X Y B A X G≤H Y B Z G’ : Suppose AP in G’ (X not in CS) with no corresp. AP in H. Then Z not in CS.

  46. Proof Sketch Two DAGs G, H with G<H Identify either: • a covered edge XY in G that has opposite orientation in H • a new edge XY to be added to G such that it remains included in H

  47. The Transformation Choose any node Y that is a sink in H Case 1a: Y is a sink in G X ParH(Y) X  ParG(Y) Case 1b: Y is a sink in G same parents Case 2a: X s.t. YX covered Case 2b: X s.t. YX & W par of Y but not X Case 2c: Every YX, Par(Y)  Par(X) Y X Y X Y Y X Y X W W Y X Y X Y Y

  48. Preliminaries (G≤ H) • The adjacencies in G are a subset of the adjacencies in H • If XYZ is a v-structure in G but not H, then X and Z are adjacent in H • Any new active path that results from adding XY to G includes XY

  49. Proof Sketch: Case 1 Y is a sink in G Case 1a: X ParH(Y) X  ParG(Y) H: X Y X G: Y X Y Suppose there’s some new active path between A and B not in H Y X B A Z • Y is a sink in G, so it must be in CS • Neither X nor next node Z is in CS • In H, AP(A,Z), AP(X,B), ZYX Case 1b: Parents identical Remove Y from both graphs: proof similar

  50. Proof Sketch: Case 2 Y is not a sink in G Case 2a: There is a covered edge YX :Reverse the edge Case 2b: There is a non-covered edge YX such that W is a parent of Y but not a parent of X W W W G’: H: G: X X Y Y X Y Suppose there’s some new active path between A and B not in H Y must be in CS, else replace WX by WYX (not new). If X not in CS, then in H active: A-W,X-B, WYX B W A B A W G’: H: Z X Y Z X Y

More Related