1 / 16

Introduction to Graphical Models Part 2 of 2

Lecture 31 of 41. Introduction to Graphical Models Part 2 of 2. Wednesday, 03 November 2004 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is based upon:

klarika
Download Presentation

Introduction to Graphical Models Part 2 of 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 31 of 41 Introduction to Graphical ModelsPart 2 of 2 Wednesday, 03 November 2004 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is based upon: http://www.kddresearch.org/KSU/CIS/Math-20021107.ppt

  2. Conditional Independence • X is conditionally independent (CI) from Y given Z (sometimes written X Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X,Y, and Z • Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T R | L • Bayesian (Belief) Network • Acyclic directed graph model B = (V, E, ) representing CI assertions over  • Vertices (nodes) V: denote events (each a random variable) • Edges (arcs, links) E: denote conditional dependencies • Markov Condition for BBNs (Chain Rule): • Example BBN Exposure-To-Toxins Serum Calcium X6 X1 X3 Age Cancer X5 X2 X4 X7 Gender Smoking Lung Tumor Graphical Models Overview [1]:Bayesian Networks P(20s, Female, Low,Non-Smoker, No-Cancer,Negative,Negative) = P(T) · P(F)·P(L | T) · P(N | T,F) · P(N | L, N) · P(N | N) · P(N | N)

  3. Fusion, Propagation, and Structuring • Fusion • Methods for combining multiple beliefs • Theory more precise than for fuzzy, ANN inference • Data and sensor fusion • Resolving conflict (vote-taking, winner-take-all, mixture estimation) • Paraconsistent reasoning • Propagation • Modeling process of evidential reasoning by updating beliefs • Source of parallelism • Natural object-oriented (message-passing) model • Communication: asynchronous –dynamic workpool management problem • Concurrency: known Petri net dualities • Structuring • Learning graphical dependencies from scores, constraints • Two parameter estimation problems: structure learning, belief revision Adapted from slides by S. Russell, UC Berkeley

  4. Bayesian Learning • Framework: Interpretations of Probability [Cheeseman, 1985] • Bayesian subjectivist view • A measure of an agent’s belief in a proposition • Proposition denoted by random variable (sample space: range) • e.g., Pr(Outlook = Sunny) = 0.8 • Frequentist view: probability is the frequency of observations of an event • Logicist view: probability is inferential evidence in favor of a proposition • Typical Applications • HCI: learning natural language; intelligent displays; decision support • Approaches: prediction; sensor and data fusion (e.g., bioinformatics) • Prediction: Examples • Measure relevant parameters: temperature, barometric pressure, wind speed • Make statement of the form Pr(Tomorrow’s-Weather = Rain) = 0.5 • College admissions: Pr(Acceptance) p • Plain beliefs: unconditional acceptance (p = 1) or categorical rejection (p = 0) • Conditional beliefs: depends on reviewer (use probabilistic model)

  5. Bayes’s Theorem • MAP Hypothesis • Generally want most probable hypothesis given the training data • Define:  the value of x in the sample space  with the highest f(x) • Maximum aposteriori hypothesis, hMAP • ML Hypothesis • Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform) • Can further simplify and choose the maximum likelihood hypothesis, hML Choosing Hypotheses

  6. Upward (child-to-parent)  messages C4 C5 C1 C2 C3 C6 ’ (Ci’) modified during  message-passing phase Downward  messages P’ (Ci’) is computed during  message-passing phase Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983) Multiply-connected case: exact, approximate inference are #P-complete (counting problem is #P-complete iff decision problem is NP-complete) Adapted from Neapolitan (1990), Guo (2000)

  7. Find Maximal Cliques Triangulate A1 Clq4 Clq1 F6 B2 Moralize Clq2 G5 E3 B2 E3 G5 G C4 G A H F B A1 D8 B2 E3 D F6 H7 D B E A E F H C C G5 E3 Bayesian Network (Acyclic Digraph) C4 G5 Clq3 C4 C4 Clq6 C4 D8 Clq5 H7 Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques) Adapted from Neapolitan (1990), Guo (2000)

  8. Inference by Clustering [2]:Junction Tree – Lauritzen & Spiegelhalter (1988) • Input: list of cliques of triangulated, moralized graphGu • Output: • Tree of cliques • Separators nodes Si, • Residual nodes Ri and potential probability (Clqi) for all cliques • Algorithm: • 1. Si = Clqi(Clq1  Clq2 … Clqi-1) • 2. Ri = Clqi - Si • 3. If i >1 then identify a j < i such that Clqjis a parent of Clqi • 4. Assign each node v to a unique clique Clqi that v  c(v)  Clqi • 5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi} • 6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques Adapted from Neapolitan (1990), Guo (2000)

  9. Ri: residual nodes Si: separator nodes (Clqi): potential probability of Clique i AB (Clq1) = P(B|A)P(A) Clq1 = {A, B} A1 Clq1 Clq4 R1 = {A, B} Clq1 S1 = {} F6 B B2 BEC Clq2 (Clq2) = P(C|B,E) G5 E3 Clq2 = {B,E,C} Clq2 B2 E3 R2 = {C,E} S2 = { B } G5 EC E3 ECG Clq3 = {E,C,G} C4 Clq3 R3 = {G} (Clq3) = 1 G5 S3 = { E,C } Clq3 C4 EG CG C4 EGF (Clq4) = P(E|F)P(G|F)P(F) CGH (Clq5) = P(H|C,G) Clq6 C4 Clq5 Clq4 (Clq2) = P(D|C) D8 Clq4 = {E, G, F} Clq5 = {C, G,H} R4 = {F} R5 = {H} Clq5 C H7 S4 = { E,G } S5 = { C,G } CD Clq6 = {C, D} Clq6 R5 = {D} S5 = { C} Inference by Clustering [3]:Clique-Tree Operations Adapted from Neapolitan (1990), Guo (2000)

  10. Age = [0, 10) X1,1 Age = [10, 20) X1,2 Age = [100, ) X1,10 Inference by Loop Cutset Conditioning • Deciding Optimal Cutset: NP-hard • Current Open Problems • Bounded cutset conditioning: ordering heuristics • Finding randomized algorithms for loop cutset optimization Split vertex in undirected cycle; condition upon each of its state values Exposure-To- Toxins Serum Calcium Number of network instantiations: Product of arity of nodes in minimal loop cutset Cancer X3 X6 X5 X4 X7 Smoking Lung Tumor X2 Gender Posterior: marginal conditioned upon cutset variable values

  11. Inference by Variable Elimination [1]:Intuition Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/

  12. Inference by Variable Elimination [2]:Factoring Operations Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/

  13. Season A Rain Sprinkler B C F D Wet Manual Watering G Slippery Inference by Variable Elimination [3]:Example P(A), P(B|A), P(C|A), P(D|B,A), P(F|B,C), P(G|F) P(G|F) G=1 P(D|B,A) P(F|B,C) P(B|A) P(C|A) P(A) P(A|G=1) = ? d = < A, C, B, F, D, G > λG(f) = ΣG=1 P(G|F) Adapted from Dechter (1996), Joehanes (2002)

  14. Tools for Building Graphical Models • Commercial Tools: Ergo, Netica, TETRAD, Hugin • Bayes Net Toolbox (BNT) – Murphy (1997-present) • Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html • Development group http://groups.yahoo.com/group/BayesNetToolbox • Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) • Distribution page http://bndev.sourceforge.net • Development group http://groups.yahoo.com/group/bndev • Current (re)implementation projects for KSU KDD Lab • Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy • Formats: XML BNIF (MSBN), Netica – Guo, Hsu • Space-efficient DBN inference – Joehanes • Bounded cutset conditioning – Chandak

  15. References [1]:Graphical Models and Inference Algorithms • Graphical Models • Bayesian (Belief) Networks tutorial – Murphy (2001) http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html • Learning Bayesian Networks – Heckerman (1996, 1999) http://research.microsoft.com/~heckerman • Inference Algorithms • Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html • (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html • Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986)http://citeseer.nj.nec.com/dechter96bucket.html • Recommended Books • Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) • Castillo, Gutierrez, Hadi (1997) • Cowell, Dawid, Lauritzen, Spiegelhalter (1999) • Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html

  16. References [2]:Machine Learning, KDD, and Bioinformatics • Machine Learning, Data Mining, and Knowledge Discovery • K-State KDD Lab: literature survey and resource catalog (2002) http://www.kddresearch.org/Resources • Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer, Thornton (2004) http://bndev.sourceforge.net • Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net • NCSA Data to Knowledge (D2K): Welge, Redman, Auvil, Tcheng, Hsu http://www.ncsa.uiuc.edu/STI/ALG • Bioinformatics • European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm • Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) http://www.cs.huji.ac.il/labs/compbio/ • K-State BMI Group: literature survey and resource catalog (2002) http://www.kddresearch.org/Groups/Bioinformatics

More Related