410 likes | 519 Views
University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes <bilmes@ee.washington.edu>. Lecture 17 Slides May 30 th , 2006. Announcements. READING: M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman)
E N D
University of WashingtonDepartment of Electrical Engineering EE512 Spring, 2006 Graphical ModelsJeff A. Bilmes <bilmes@ee.washington.edu> Lecture 17 Slides May 30th, 2006 EE512 - Graphical Models - J. Bilmes
Announcements • READING: • M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) • Reminder: TA discussions and office hours: • Office hours: Thursdays 3:30-4:30, Sieg Ground Floor Tutorial Center • Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor Tutorial Center Lecture Room • No more homework this quarter, concentrate on final projects!! • Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email). EE512 - Graphical Models - J. Bilmes
Class Road Map • L1: Tues, 3/28: Overview, GMs, Intro BNs. • L2: Thur, 3/30: semantics of BNs + UGMs • L3: Tues, 4/4: elimination, probs, chordal I • L4: Thur, 4/6: chrdal, sep, decomp, elim • L5: Tue, 4/11: chdl/elim, mcs, triang, ci props. • L6: Thur, 4/13: MST,CI axioms, Markov prps. • L7: Tues, 4/18: Mobius, HC-thm, (F)=(G) • L8: Thur, 4/20: phylogenetic trees, HMMs • L9: Tue, 4/25: HMMs, inference on trees • L10: Thur, 4/27: Inference on trees, start poly • L11: Tues, 5/2: polytrees, start JT inference • L12: Thur, 5/4: Inference in JTs • Tues, 5/9: away • Thur, 5/11: away • L13: Tue, 5/16: JT, GDL, Shenoy-Schafer • L14: Thur, 5/18: GDL, Search, Gaussians I • L15: Mon, 5/22: laptop crash • L16: Tues, 5/23: search, Gaussians I • L17: Thur, 5/25: Gaussians • Mon, 5/29: Holiday • L18: Tue, 5/30 • L19: Thur, 6/1: final presentations EE512 - Graphical Models - J. Bilmes
Final Project Milestone Due Dates • L1: Tues, 3/28: • L2: Thur, 3/30: • L3: Tues, 4/4: • L4: Thur, 4/6: • L5: Tue, 4/11: • L6: Thur, 4/13: • L7: Tues, 4/18: • L8: Thur, 4/20: Team Lists, short abstracts I • L9: Tue, 4/25: • L10: Thur, 4/27: short abstracts II • L11: Tues, 5/2: • L12: Thur, 5/4: abstract II + progress • L--: Tues, 5/9 • L--: Thur, 5/11: 1 page progress report • L13: Tue, 5/16: • L14: Thur, 5/18: 1 page progress report • L15: Tues, 5/23 • L16: Thur, 5/25: 1 page progress report • L17: Tue, 5/30: Today • L18: Wed, 5/31: • L19: Thur, 6/1: final presentations • L20: Tue, 6/6 4-page papers due (like a conference paper), Only .pdf versions accepted. • Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only). • Final reports must be turned in electronically in PDF (no other formats accepted). • No need to repeat what was on previous progress reports/abstracts, I have those available to refer to. • Progress reports must report who did what so far!! EE512 - Graphical Models - J. Bilmes
Summary of Last Time • Gaussian Graphical Models EE512 - Graphical Models - J. Bilmes
Outline of Today’s Lecture • Other forms of inference. • Structure learning in graphical models EE512 - Graphical Models - J. Bilmes
Books and Sources for Today • Jordan chapters 13-15 • Other references contained in presentation … EE512 - Graphical Models - J. Bilmes
Graphical Models • We start with some probability distribution P • Could be specified as a given, or more likely we have training data of some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.) • The graph G=(V,E) represents “structure” in P • Graph can provide efficient representation and computational inference for P • There can be multiple graphs that represent a given P (e.g., complete graph represents all P). • Goal: find computationally cheap exact or approximate graph cover for P • Once we do this, we just compute probabilities using the junction tree algorithm or search algorithm, etc. EE512 - Graphical Models - J. Bilmes
Graphical Models & Tree-width • The complexity parameter for G=(V,E) • Def: k-tree: k-nodes, clique of size k. n>k nodes, connect nth node to previous k fully connected nodes • Example: 4-tree note: all separators are of size 4 4-tree with 4 nodes 4-tree with 5 nodes 4-tree with 6 nodes EE512 - Graphical Models - J. Bilmes
Graphical Models & Tree-width • Def: partial k-tree: any sub-graph of a k-tree • Def: tree-width of a graph G is smallest k such that G is a partial k-tree. • Thm: The tree-width decision problem is NP-complete • We mentioned this before, proven by Arnborg, • Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width • Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive? • The big question, what if exact inference is too expensive? EE512 - Graphical Models - J. Bilmes
When exact inference is too expensive • Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem. • Exact solution to approximate problem • Structure learning: find a low tree-width (or “cheap” in some way) graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model. • This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm. • Approximate solution to an exact problem • Approximate inference, tries to approximate in some way what must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above EE512 - Graphical Models - J. Bilmes
Finding k-trees • How do we score a k-tree? • Maximum likelihood, or conditional score • May we assume that truth itself is a k-tree • Sometimes simplifications can be made if we assume that truth is part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes. • How to find best 1-tree? EE512 - Graphical Models - J. Bilmes
Finding 1-trees • Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense. EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Finding 1-trees EE512 - Graphical Models - J. Bilmes
Plethora of negative results • Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”) • Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score. • Meek2001: learning even a path (sub-class of trees) in ML sense is NP-hard. EE512 - Graphical Models - J. Bilmes
Plethora of negative results • Srebro/Karger2001: learning k-trees in ML sense is hard. • So, generative model structure learning is likely to be a difficult problem (unless k=1, or P=NP). • We next spend a bit of time talking about the Srebro/Karger result. EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Optimal ML k-trees is NP-complete EE512 - Graphical Models - J. Bilmes
Some good news … • PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree. • Hoeffgen’93: Can robustly (polynomial samples in n, 1/ 1/) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees. • Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models. EE512 - Graphical Models - J. Bilmes
More good news … • Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs • note: this does not have complexity guarantee. E.g., nxn grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general. EE512 - Graphical Models - J. Bilmes
How to PAC-learn such graphs … • Mutual information is symmetric submodular EE512 - Graphical Models - J. Bilmes
How to PAC-learn such graphs … • Submodularity and Optimization (Narisimhan&Bilmes,2004) EE512 - Graphical Models - J. Bilmes
Another positive result • Since mutual information is symmetric-submodular, we can find optimal partitions: • where • This has implications for clustering (Narishamhan,Jojic,Bilmes’05) and also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator). EE512 - Graphical Models - J. Bilmes
Finding ML decompositions … • Optimal to one level EE512 - Graphical Models - J. Bilmes
Discriminative structure • Goal might be classification using a generative model. • Distinction between parameters & structure • Two possible goals: • 1) find one global structure that classifies well • 2) find class-specific structure (one per class) • In either case, finding a good discriminative structure may render discriminative parameter learning less necessary. EE512 - Graphical Models - J. Bilmes
Optimal discriminative structure procedure … • choose k (for now, lets just assume k=1) • Find tree that best satisfies: EE512 - Graphical Models - J. Bilmes
Properties • Options: • can fix structure and train parameters using either maximum likelihood (generative) or maximum conditional likelihood (discriminative) • Can learn discriminative structure, and can train either generatively or discriminatively • In all cases, assume appropriate regularization. • Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case. • Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case). EE512 - Graphical Models - J. Bilmes
EAR measure • EAR (explaining away residual) measure. (Bilmes,’98) • Goal is to maximize EAR: • Intuition: dependence class-conditionally, but otherwise independent • EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables. EE512 - Graphical Models - J. Bilmes
Conditional mutual information? • Conditional mutual information is not guaranteed to discriminate well. • Building a MST using I(X1;X2|C) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases. • Example: 3 features (X1,X2,X3) and a class C EE512 - Graphical Models - J. Bilmes
Generative training/structure EE512 - Graphical Models - J. Bilmes
Generative training/structure EE512 - Graphical Models - J. Bilmes
General Structure Learning EE512 - Graphical Models - J. Bilmes