160 likes | 281 Views
Lecture 31 of 41. Introduction to Graphical Models Part 2 of 2. Wednesday, 03 November 2004 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is based upon:
E N D
Lecture 31 of 41 Introduction to Graphical ModelsPart 2 of 2 Wednesday, 03 November 2004 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is based upon: http://www.kddresearch.org/KSU/CIS/Math-20021107.ppt
Conditional Independence • X is conditionally independent (CI) from Y given Z (sometimes written X Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X,Y, and Z • Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning) T R | L • Bayesian (Belief) Network • Acyclic directed graph model B = (V, E, ) representing CI assertions over • Vertices (nodes) V: denote events (each a random variable) • Edges (arcs, links) E: denote conditional dependencies • Markov Condition for BBNs (Chain Rule): • Example BBN Exposure-To-Toxins Serum Calcium X6 X1 X3 Age Cancer X5 X2 X4 X7 Gender Smoking Lung Tumor Graphical Models Overview [1]:Bayesian Networks P(20s, Female, Low,Non-Smoker, No-Cancer,Negative,Negative) = P(T) · P(F)·P(L | T) · P(N | T,F) · P(N | L, N) · P(N | N) · P(N | N)
Fusion, Propagation, and Structuring • Fusion • Methods for combining multiple beliefs • Theory more precise than for fuzzy, ANN inference • Data and sensor fusion • Resolving conflict (vote-taking, winner-take-all, mixture estimation) • Paraconsistent reasoning • Propagation • Modeling process of evidential reasoning by updating beliefs • Source of parallelism • Natural object-oriented (message-passing) model • Communication: asynchronous –dynamic workpool management problem • Concurrency: known Petri net dualities • Structuring • Learning graphical dependencies from scores, constraints • Two parameter estimation problems: structure learning, belief revision Adapted from slides by S. Russell, UC Berkeley
Bayesian Learning • Framework: Interpretations of Probability [Cheeseman, 1985] • Bayesian subjectivist view • A measure of an agent’s belief in a proposition • Proposition denoted by random variable (sample space: range) • e.g., Pr(Outlook = Sunny) = 0.8 • Frequentist view: probability is the frequency of observations of an event • Logicist view: probability is inferential evidence in favor of a proposition • Typical Applications • HCI: learning natural language; intelligent displays; decision support • Approaches: prediction; sensor and data fusion (e.g., bioinformatics) • Prediction: Examples • Measure relevant parameters: temperature, barometric pressure, wind speed • Make statement of the form Pr(Tomorrow’s-Weather = Rain) = 0.5 • College admissions: Pr(Acceptance) p • Plain beliefs: unconditional acceptance (p = 1) or categorical rejection (p = 0) • Conditional beliefs: depends on reviewer (use probabilistic model)
Bayes’s Theorem • MAP Hypothesis • Generally want most probable hypothesis given the training data • Define: the value of x in the sample space with the highest f(x) • Maximum aposteriori hypothesis, hMAP • ML Hypothesis • Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform) • Can further simplify and choose the maximum likelihood hypothesis, hML Choosing Hypotheses
Upward (child-to-parent) messages C4 C5 C1 C2 C3 C6 ’ (Ci’) modified during message-passing phase Downward messages P’ (Ci’) is computed during message-passing phase Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983) Multiply-connected case: exact, approximate inference are #P-complete (counting problem is #P-complete iff decision problem is NP-complete) Adapted from Neapolitan (1990), Guo (2000)
Find Maximal Cliques Triangulate A1 Clq4 Clq1 F6 B2 Moralize Clq2 G5 E3 B2 E3 G5 G C4 G A H F B A1 D8 B2 E3 D F6 H7 D B E A E F H C C G5 E3 Bayesian Network (Acyclic Digraph) C4 G5 Clq3 C4 C4 Clq6 C4 D8 Clq5 H7 Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques) Adapted from Neapolitan (1990), Guo (2000)
Inference by Clustering [2]:Junction Tree – Lauritzen & Spiegelhalter (1988) • Input: list of cliques of triangulated, moralized graphGu • Output: • Tree of cliques • Separators nodes Si, • Residual nodes Ri and potential probability (Clqi) for all cliques • Algorithm: • 1. Si = Clqi(Clq1 Clq2 … Clqi-1) • 2. Ri = Clqi - Si • 3. If i >1 then identify a j < i such that Clqjis a parent of Clqi • 4. Assign each node v to a unique clique Clqi that v c(v) Clqi • 5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi} • 6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques Adapted from Neapolitan (1990), Guo (2000)
Ri: residual nodes Si: separator nodes (Clqi): potential probability of Clique i AB (Clq1) = P(B|A)P(A) Clq1 = {A, B} A1 Clq1 Clq4 R1 = {A, B} Clq1 S1 = {} F6 B B2 BEC Clq2 (Clq2) = P(C|B,E) G5 E3 Clq2 = {B,E,C} Clq2 B2 E3 R2 = {C,E} S2 = { B } G5 EC E3 ECG Clq3 = {E,C,G} C4 Clq3 R3 = {G} (Clq3) = 1 G5 S3 = { E,C } Clq3 C4 EG CG C4 EGF (Clq4) = P(E|F)P(G|F)P(F) CGH (Clq5) = P(H|C,G) Clq6 C4 Clq5 Clq4 (Clq2) = P(D|C) D8 Clq4 = {E, G, F} Clq5 = {C, G,H} R4 = {F} R5 = {H} Clq5 C H7 S4 = { E,G } S5 = { C,G } CD Clq6 = {C, D} Clq6 R5 = {D} S5 = { C} Inference by Clustering [3]:Clique-Tree Operations Adapted from Neapolitan (1990), Guo (2000)
Age = [0, 10) X1,1 Age = [10, 20) X1,2 Age = [100, ) X1,10 Inference by Loop Cutset Conditioning • Deciding Optimal Cutset: NP-hard • Current Open Problems • Bounded cutset conditioning: ordering heuristics • Finding randomized algorithms for loop cutset optimization Split vertex in undirected cycle; condition upon each of its state values Exposure-To- Toxins Serum Calcium Number of network instantiations: Product of arity of nodes in minimal loop cutset Cancer X3 X6 X5 X4 X7 Smoking Lung Tumor X2 Gender Posterior: marginal conditioned upon cutset variable values
Inference by Variable Elimination [1]:Intuition Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/
Inference by Variable Elimination [2]:Factoring Operations Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/
Season A Rain Sprinkler B C F D Wet Manual Watering G Slippery Inference by Variable Elimination [3]:Example P(A), P(B|A), P(C|A), P(D|B,A), P(F|B,C), P(G|F) P(G|F) G=1 P(D|B,A) P(F|B,C) P(B|A) P(C|A) P(A) P(A|G=1) = ? d = < A, C, B, F, D, G > λG(f) = ΣG=1 P(G|F) Adapted from Dechter (1996), Joehanes (2002)
Tools for Building Graphical Models • Commercial Tools: Ergo, Netica, TETRAD, Hugin • Bayes Net Toolbox (BNT) – Murphy (1997-present) • Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html • Development group http://groups.yahoo.com/group/BayesNetToolbox • Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) • Distribution page http://bndev.sourceforge.net • Development group http://groups.yahoo.com/group/bndev • Current (re)implementation projects for KSU KDD Lab • Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy • Formats: XML BNIF (MSBN), Netica – Guo, Hsu • Space-efficient DBN inference – Joehanes • Bounded cutset conditioning – Chandak
References [1]:Graphical Models and Inference Algorithms • Graphical Models • Bayesian (Belief) Networks tutorial – Murphy (2001) http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html • Learning Bayesian Networks – Heckerman (1996, 1999) http://research.microsoft.com/~heckerman • Inference Algorithms • Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html • (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html • Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986)http://citeseer.nj.nec.com/dechter96bucket.html • Recommended Books • Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) • Castillo, Gutierrez, Hadi (1997) • Cowell, Dawid, Lauritzen, Spiegelhalter (1999) • Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html
References [2]:Machine Learning, KDD, and Bioinformatics • Machine Learning, Data Mining, and Knowledge Discovery • K-State KDD Lab: literature survey and resource catalog (2002) http://www.kddresearch.org/Resources • Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer, Thornton (2004) http://bndev.sourceforge.net • Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net • NCSA Data to Knowledge (D2K): Welge, Redman, Auvil, Tcheng, Hsu http://www.ncsa.uiuc.edu/STI/ALG • Bioinformatics • European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm • Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) http://www.cs.huji.ac.il/labs/compbio/ • K-State BMI Group: literature survey and resource catalog (2002) http://www.kddresearch.org/Groups/Bioinformatics