1 / 244

Principles and Applications of Probabilistic Learning

Learn the fundamentals of probabilistic modeling, leveraging prior knowledge, handling missing data, and more. Review of probability, graphical models, and case studies provided by P. Smyth, UC Irvine.

ckarl
Download Presentation

Principles and Applications of Probabilistic Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles and Applications ofProbabilistic Learning Padhraic Smyth Department of Computer Science University of California, Irvine www.ics.uci.edu/~smyth Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  2. NEW New Slides • Original slides created in mid-July for ACM • Some new slides have been added • “new” logo in upper left Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  3. UPDATED New Slides • Original slides created in mid-July for ACM • Some new slides have been added • “new” logo in upper left • A few slides have been updated • “updated” logo in upper left • Current slides (including new and updated) at: www.ics.uci.edu/~smyth/talks Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  4. NEW From the tutorial Web page: “The intent of this tutorial is to provide a starting point for students and researchers……” Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  5. Probabilistic Modeling vs. Function Approximation • Two major themes in machine learning: 1. Function approximation/”black box” methods • e.g., for classification and regression • Learn a flexible function y = f(x) • e.g., SVMs, decision trees, boosting, etc 2. Probabilistic learning • e.g., for regression, model p(y|x) or p(y,x) • e.g, graphical models, mixture models, hidden Markov models, etc • Both approaches are useful in general • In this tutorial we will focus only on the 2nd approach, probabilistic modeling Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  6. Motivations for Probabilistic Modeling • leverage prior knowledge • generalize beyond data analysis in vector-spaces • handle missing data • combine multiple types of information into an analysis • generate calibrated probability outputs • quantify uncertainty about parameters, models, and predictions in a statistical manner Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  7. NEW Learning object models in visionWeber, Welling, Perona, 2000 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  8. NEW Learning object models in visionWeber, Welling, Perona, 2000 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  9. NEW Learning to Extract Information from Documents e.g., Seymore, McCallum, Rosenfeld, 1999 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  10. NEW Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  11. NEW Segal, Friedman, Koller, et al, Nature Genetics, 2005 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  12. P(Data | Parameters) Real World Data Probabilistic Model Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  13. P(Data | Parameters) Real World Data Probabilistic Model P(Parameters | Data) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  14. (Generative Model) P(Data | Parameters) Real World Data Probabilistic Model P(Parameters | Data) (Inference) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  15. Outline • Review of probability • Graphical models • Connecting probability models to data • Models with hidden variables • Case studies (i) Simulating and forecasting rainfall data(ii) Curve clustering with cyclone trajectories(iii) Topic modeling from text documents Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  16. Part 1: Review of Probability Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  17. Notation and Definitions • X is a random variable • Lower-case x is some possible value for X • “X = x” is a logical proposition: that X takes value x • There is uncertainty about the value of X • e.g., X is the Dow Jones index at 5pm tomorrow • p(X = x) is the probability that proposition X=x is true • often shortened to p(x) • If the set of possible x’s is finite, we have a probability distribution and S p(x) = 1 • If the set of possible x’s is infinite, p(x) is a density function, and p(x) integrates to 1 over the range of X Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  18. Example • Let X be the Dow Jones Index (DJI) at 5pm Monday August 22nd (tomorrow) • X can take real values from 0 to some large number • p(x) is a density representing our uncertainty about X • This density could be constructed from historical data, e.g., • After 5pm p(x) becomes infinitely narrow around the true known x (no uncertainty) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  19. Probability as Degree of Belief • Different agents can have different p(x)’s • Your p(x) and the p(x) of a Wall Street expert might be quite different • OR: if we were on vacation we might not have access to stock market information • we would still be uncertain about p(x) after 5pm • So we should really think of p(x) as p(x | BI) • Where BI is background information available to agent I • (will drop explicit conditioning on BI in notation) • Thus, p(x) represents the degree of belief that agent I has in proposition x, conditioned on available background information Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  20. Comments on Degree of Belief • Different agents can have different probability models • There is no necessarily “correct” p(x) • Why? Because p(x) is a model built on whatever assumptions or background information we use • Naturally leads to the notion of updating • p(x | BI) -> p(x | BI, CI) • This is the subjective Bayesian interpretation of probability • Generalizes other interpretations (such as frequentist) • Can be used in cases where frequentist reasoning is not applicable • We will use “degree of belief” as our interpretation of p(x) in this tutorial • Note! • Degree of belief is just our semantic interpretation of p(x) • The mathematics of probability (e.g., Bayes rule) remain the same regardless of our semantic interpretation Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  21. Multiple Variables • p(x, y, z) • Probability that X=x AND Y=y AND Z =z • Possible values: cross-product of X Y Z • e.g., X, Y, Z each take 10 possible values • x,y,z can take 103 possible values • p(x,y,z) is a 3-dimensional array/table • Defines 103 probabilities • Note the exponential increase as we add more variables • e.g., X, Y, Z are all real-valued • x,y,z live in a 3-dimensional vector space • p(x,y,z) is a positive function defined over this space, integrates to 1 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  22. Conditional Probability • p(x | y, z) • Probability of x given that Y=y and Z = z • Could be • hypothetical, e.g., “if Y=y and if Z = z” • observational, e.g., we observed values y and z • can also have p(x, y | z), etc • “all probabilities are conditional probabilities” • Computing conditional probabilities is the basis of many prediction and learning problems, e.g., • p(DJI tomorrow | DJI index last week) • expected value of [DJI tomorrow | DJI index next week) • most likely value of parameter a given observed data Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  23. Computing Conditional Probabilities • Variables A, B, C, D • All distributions of interest related to A,B,C,D can be computed from the full joint distribution p(a,b,c,d) • Examples, using the Law of Total Probability • p(a) = S{b,c,d} p(a, b, c, d) • p(c,d) = S{a,b} p(a, b, c, d) • p(a,c | d) = S{b} p(a, b, c | d) where p(a, b, c | d) = p(a,b,c,d)/p(d) • These are standard probability manipulations: however, we will see how to use these to make inferences about parameters and unobserved variables, given data Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  24. Conditional Independence • A is conditionally independent of B given C iff p(a | b, c) = p(a | c) (also implies that B is conditionally independent of A given C) • In words, B provides no information about A, if value of C is known • Example: • a = “patient has upset stomach” • b = “patient has headache” • c = “patient has flu” • Note that conditional independence does not imply marginal independence Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  25. Two Practical Problems (Assume for simplicity each variable takes K values) • Problem 1: Computational Complexity • Conditional probability computations scale as O(KN) • where N is the number of variables being summed over • Problem 2: Model Specification • To specify a joint distribution we need a table of O(KN) numbers • Where do these numbers come from? Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  26. Two Key Ideas • Problem 1: Computational Complexity • Idea: Graphical models • Structured probability models lead to tractable inference • Problem 2: Model Specification • Idea: Probabilistic learning • General principles for learning from data Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  27. Part 2: Graphical Models Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  28. “…probability theory is more fundamentally concerned with the structure of reasoning and causation than with numbers.” Glenn Shafer and Judea Pearl Introduction to Readings in Uncertain Reasoning, Morgan Kaufmann, 1990 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  29. Graphical Models • Represent dependency structure with a directed graph • Node <-> random variable • Edges encode dependencies • Absence of edge -> conditional independence • Directed and undirected versions • Why is this useful? • A language for communication • A language for computation • Origins: • Wright 1920’s • Independently developed by Spiegelhalter and Lauritzen in statistics and Pearl in computer science in the late 1980’s Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  30. A B C Examples of 3-way Graphical Models Marginal Independence: p(A,B,C) = p(A) p(B) p(C) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  31. A C B Examples of 3-way Graphical Models Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) B and C are conditionally independent Given A e.g., A is a disease, and we model B and C as conditionally independent symptoms given A Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  32. A B C Examples of 3-way Graphical Models Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  33. A B C Examples of 3-way Graphical Models Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  34. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Real-World Example Monitoring Intensive-Care Patients • 37 variables • 509 parameters …instead of 237 (figure courtesy of Kevin Murphy/Nir Friedman) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  35. B A C Directed Graphical Models p(A,B,C) = p(C|A,B)p(A)p(B) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  36. B A C Directed Graphical Models p(A,B,C) = p(C|A,B)p(A)p(B) In general, p(X1, X2,....XN) =  p(Xi | parents(Xi ) ) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  37. B A C Directed Graphical Models p(A,B,C) = p(C|A,B)p(A)p(B) In general, p(X1, X2,....XN) =  p(Xi | parents(Xi ) ) • Probability model has simple factored form • Directed edges => direct dependence • Absence of an edge => conditional independence • Also known as belief networks, Bayesian networks, causal networks Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  38. Example D B E C A F G Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  39. Example D B E C A F G p(A, B, C, D, E, F, G) =  p( variable | parents ) = p(A|B)p(C|B)p(B|D)p(F|E)p(G|E)p(E|D) p(D) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  40. Example D B E c A g F Say we want to compute p(a | c, g) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  41. Example D B E c A g F Direct calculation: p(a|c,g) = Sbdef p(a,b,d,e,f | c,g) Complexity of the sum is O(K4) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  42. Example D B E c A g F Reordering (using factorization): Sb p(a|b) Sd p(b|d,c) Se p(d|e) Sf p(e,f |g) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  43. Example D B E c A g F Reordering: Sbp(a|b) Sd p(b|d,c) Se p(d|e) Sf p(e,f |g) p(e|g) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  44. Example D B E c A g F Reordering: Sbp(a|b) Sd p(b|d,c) Se p(d|e) p(e|g) p(d|g) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  45. Example D B E c A g F Reordering: Sbp(a|b) Sd p(b|d,c) p(d|g) p(b|c,g) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  46. Example D B E c A g F Reordering: Sbp(a|b) p(b|c,g) p(a|c,g) Complexity is O(K), compared to O(K4) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  47. A More General Algorithm • Message Passing (MP) Algorithm • Pearl, 1988; Lauritzen and Spiegelhalter, 1988 • Declare 1 node (any node) to be a root • Schedule two phases of message-passing • nodes pass messages up to the root • messages are distributed back to the leaves • In time O(N), we can compute P(….) Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  48. Sketch of the MP algorithm in action Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  49. Sketch of the MP algorithm in action 1 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

  50. Sketch of the MP algorithm in action 2 1 Probabilistic Learning Tutorial: P. Smyth, UC Irvine, August 2005

More Related