1 / 41

Lecture 17 Slides May 30 th , 2006

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes <bilmes@ee.washington.edu>. Lecture 17 Slides May 30 th , 2006. Announcements. READING: M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman)

stone-combs
Download Presentation

Lecture 17 Slides May 30 th , 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of WashingtonDepartment of Electrical Engineering EE512 Spring, 2006 Graphical ModelsJeff A. Bilmes <bilmes@ee.washington.edu> Lecture 17 Slides May 30th, 2006 EE512 - Graphical Models - J. Bilmes

  2. Announcements • READING: • M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) • Reminder: TA discussions and office hours: • Office hours: Thursdays 3:30-4:30, Sieg Ground Floor Tutorial Center • Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor Tutorial Center Lecture Room • No more homework this quarter, concentrate on final projects!! • Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email). EE512 - Graphical Models - J. Bilmes

  3. Class Road Map • L1: Tues, 3/28: Overview, GMs, Intro BNs. • L2: Thur, 3/30: semantics of BNs + UGMs • L3: Tues, 4/4: elimination, probs, chordal I • L4: Thur, 4/6: chrdal, sep, decomp, elim • L5: Tue, 4/11: chdl/elim, mcs, triang, ci props. • L6: Thur, 4/13: MST,CI axioms, Markov prps. • L7: Tues, 4/18: Mobius, HC-thm, (F)=(G) • L8: Thur, 4/20: phylogenetic trees, HMMs • L9: Tue, 4/25: HMMs, inference on trees • L10: Thur, 4/27: Inference on trees, start poly • L11: Tues, 5/2: polytrees, start JT inference • L12: Thur, 5/4: Inference in JTs • Tues, 5/9: away • Thur, 5/11: away • L13: Tue, 5/16: JT, GDL, Shenoy-Schafer • L14: Thur, 5/18: GDL, Search, Gaussians I • L15: Mon, 5/22: laptop crash  • L16: Tues, 5/23: search, Gaussians I • L17: Thur, 5/25: Gaussians • Mon, 5/29: Holiday • L18: Tue, 5/30 • L19: Thur, 6/1: final presentations EE512 - Graphical Models - J. Bilmes

  4. Final Project Milestone Due Dates • L1: Tues, 3/28: • L2: Thur, 3/30: • L3: Tues, 4/4: • L4: Thur, 4/6: • L5: Tue, 4/11: • L6: Thur, 4/13: • L7: Tues, 4/18: • L8: Thur, 4/20: Team Lists, short abstracts I • L9: Tue, 4/25: • L10: Thur, 4/27: short abstracts II • L11: Tues, 5/2: • L12: Thur, 5/4: abstract II + progress • L--: Tues, 5/9 • L--: Thur, 5/11: 1 page progress report • L13: Tue, 5/16: • L14: Thur, 5/18: 1 page progress report • L15: Tues, 5/23 • L16: Thur, 5/25: 1 page progress report • L17: Tue, 5/30: Today • L18: Wed, 5/31: • L19: Thur, 6/1: final presentations • L20: Tue, 6/6 4-page papers due (like a conference paper), Only .pdf versions accepted. • Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only). • Final reports must be turned in electronically in PDF (no other formats accepted). • No need to repeat what was on previous progress reports/abstracts, I have those available to refer to. • Progress reports must report who did what so far!! EE512 - Graphical Models - J. Bilmes

  5. Summary of Last Time • Gaussian Graphical Models EE512 - Graphical Models - J. Bilmes

  6. Outline of Today’s Lecture • Other forms of inference. • Structure learning in graphical models EE512 - Graphical Models - J. Bilmes

  7. Books and Sources for Today • Jordan chapters 13-15 • Other references contained in presentation … EE512 - Graphical Models - J. Bilmes

  8. Graphical Models • We start with some probability distribution P • Could be specified as a given, or more likely we have training data of some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.) • The graph G=(V,E) represents “structure” in P • Graph can provide efficient representation and computational inference for P • There can be multiple graphs that represent a given P (e.g., complete graph represents all P). • Goal: find computationally cheap exact or approximate graph cover for P • Once we do this, we just compute probabilities using the junction tree algorithm or search algorithm, etc. EE512 - Graphical Models - J. Bilmes

  9. Graphical Models & Tree-width • The complexity parameter for G=(V,E) • Def: k-tree: k-nodes, clique of size k. n>k nodes, connect nth node to previous k fully connected nodes • Example: 4-tree note: all separators are of size 4 4-tree with 4 nodes 4-tree with 5 nodes 4-tree with 6 nodes EE512 - Graphical Models - J. Bilmes

  10. Graphical Models & Tree-width • Def: partial k-tree: any sub-graph of a k-tree • Def: tree-width of a graph G is smallest k such that G is a partial k-tree. • Thm: The tree-width decision problem is NP-complete • We mentioned this before, proven by Arnborg, • Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width • Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive? • The big question, what if exact inference is too expensive? EE512 - Graphical Models - J. Bilmes

  11. When exact inference is too expensive • Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem. • Exact solution to approximate problem • Structure learning: find a low tree-width (or “cheap” in some way) graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model. • This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm. • Approximate solution to an exact problem • Approximate inference, tries to approximate in some way what must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above EE512 - Graphical Models - J. Bilmes

  12. Finding k-trees • How do we score a k-tree? • Maximum likelihood, or conditional score • May we assume that truth itself is a k-tree • Sometimes simplifications can be made if we assume that truth is part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes. • How to find best 1-tree? EE512 - Graphical Models - J. Bilmes

  13. Finding 1-trees • Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense. EE512 - Graphical Models - J. Bilmes

  14. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  15. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  16. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  17. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  18. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  19. Finding 1-trees EE512 - Graphical Models - J. Bilmes

  20. Plethora of negative results • Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”) • Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score. • Meek2001: learning even a path (sub-class of trees) in ML sense is NP-hard. EE512 - Graphical Models - J. Bilmes

  21. Plethora of negative results • Srebro/Karger2001: learning k-trees in ML sense is hard. • So, generative model structure learning is likely to be a difficult problem (unless k=1, or P=NP). • We next spend a bit of time talking about the Srebro/Karger result. EE512 - Graphical Models - J. Bilmes

  22. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  23. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  24. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  25. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  26. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  27. Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes

  28. Some good news … • PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree. • Hoeffgen’93: Can robustly (polynomial samples in n, 1/ 1/) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees. • Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models. EE512 - Graphical Models - J. Bilmes

  29. More good news … • Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs • note: this does not have complexity guarantee. E.g., nxn grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general. EE512 - Graphical Models - J. Bilmes

  30. How to PAC-learn such graphs … • Mutual information is symmetric submodular EE512 - Graphical Models - J. Bilmes

  31. How to PAC-learn such graphs … • Submodularity and Optimization (Narisimhan&Bilmes,2004) EE512 - Graphical Models - J. Bilmes

  32. Another positive result • Since mutual information is symmetric-submodular, we can find optimal partitions: • where • This has implications for clustering (Narishamhan,Jojic,Bilmes’05) and also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator). EE512 - Graphical Models - J. Bilmes

  33. Finding ML decompositions … • Optimal to one level EE512 - Graphical Models - J. Bilmes

  34. Discriminative structure • Goal might be classification using a generative model. • Distinction between parameters & structure • Two possible goals: • 1) find one global structure that classifies well • 2) find class-specific structure (one per class) • In either case, finding a good discriminative structure may render discriminative parameter learning less necessary. EE512 - Graphical Models - J. Bilmes

  35. Optimal discriminative structure procedure … • choose k (for now, lets just assume k=1) • Find tree that best satisfies: EE512 - Graphical Models - J. Bilmes

  36. Properties • Options: • can fix structure and train parameters using either maximum likelihood (generative) or maximum conditional likelihood (discriminative) • Can learn discriminative structure, and can train either generatively or discriminatively • In all cases, assume appropriate regularization. • Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case. • Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case). EE512 - Graphical Models - J. Bilmes

  37. EAR measure • EAR (explaining away residual) measure. (Bilmes,’98) • Goal is to maximize EAR: • Intuition: dependence class-conditionally, but otherwise independent • EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables. EE512 - Graphical Models - J. Bilmes

  38. Conditional mutual information? • Conditional mutual information is not guaranteed to discriminate well. • Building a MST using I(X1;X2|C) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases. • Example: 3 features (X1,X2,X3) and a class C EE512 - Graphical Models - J. Bilmes

  39. Generative training/structure EE512 - Graphical Models - J. Bilmes

  40. Generative training/structure EE512 - Graphical Models - J. Bilmes

  41. General Structure Learning EE512 - Graphical Models - J. Bilmes

More Related