1 / 73

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning. Chapter 8: graphical models. Bayesian Networks. Directed Acyclic Graph (DAG). Bayesian Networks. General Factorization. Bayesian Curve Fitting (1) . Polynomial. Bayesian Curve Fitting (2) . Plate. Bayesian Curve Fitting (3) .

ejolin
Download Presentation

Pattern Recognition and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition and Machine Learning Chapter 8: graphical models

  2. Bayesian Networks Directed Acyclic Graph (DAG)

  3. Bayesian Networks General Factorization

  4. Bayesian Curve Fitting (1) Polynomial

  5. Bayesian Curve Fitting (2) Plate

  6. Bayesian Curve Fitting (3) Input variables and explicit hyperparameters

  7. Bayesian Curve Fitting —Learning Condition on data

  8. Bayesian Curve Fitting —Prediction Predictive distribution: where

  9. Generative Models Causal process for generating images

  10. Discrete Variables (1) General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K{ 1) parameters

  11. Discrete Variables (2) General joint distribution over M variables: KM{ 1 parameters M -node Markov chain: K{ 1 + (M{ 1) K(K{ 1) parameters

  12. Discrete Variables: Bayesian Parameters (1)

  13. Discrete Variables: Bayesian Parameters (2) Shared prior

  14. Parameterized Conditional Distributions If are discrete, K-state variables, in general has O(K M) parameters. The parameterized form requires only M+ 1 parameters

  15. Linear-Gaussian Models Directed Graph Vector-valued Gaussian Nodes • Each node is Gaussian, the mean is a linear function of the parents.

  16. Conditional Independence a is independent of b given c Equivalently Notation

  17. Conditional Independence: Example 1

  18. Conditional Independence: Example 1

  19. Conditional Independence: Example 2

  20. Conditional Independence: Example 2

  21. Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c unobserved.

  22. Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c observed.

  23. “Am I out of fuel?” B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full) and hence

  24. “Am I out of fuel?” Probability of an empty tank increased by observing G = 0.

  25. “Am I out of fuel?” Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

  26. D-separation • A, B, and C are non-intersecting subsets of nodes in a directed graph. • A path from A to B is blocked if it contains a node such that either • the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or • the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. • If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .

  27. D-separation: Example

  28. D-separation: I.I.D. Data

  29. Directed Graphs as Distribution Filters

  30. The Markov Blanket Factors independent of xi cancel between numerator and denominator.

  31. Cliques and Maximal Cliques Clique Maximal Clique

  32. Joint Distribution where is the potential over clique C and is the normalization coefficient; note: MK-state variables  KM terms in Z. Energies and the Boltzmann distribution

  33. Illustration: Image De-Noising (1) Original Image Noisy Image

  34. Illustration: Image De-Noising (2)

  35. Illustration: Image De-Noising (3) Noisy Image Restored Image (ICM)

  36. Illustration: Image De-Noising (4) Restored Image (ICM) Restored Image (Graph cuts)

  37. Converting Directed to Undirected Graphs (1)

  38. Converting Directed to Undirected Graphs (2) Moralizing: “Marrying the parents” Additional links

  39. Properties A graph is said to be a D map (dependency map) of a distribution if every conditional independence statement satisfied by the distribution is reflected in the graph A completely disconnected graph is a trivial D map for ANY distribution A graph is said to be an I map if every conditional independence statement implied by it is satisfied by a specific distribution A fully connected graph is a trivial I map for ANY distribution If every conditional independence property of a distribution is reflected in a graph and vice-versa, it is said that the graph is a perfect map for the distribution. (A perfect map is both an I map and a D map.)

  40. Directed vs. Undirected Graphs (1) All distributions Set of distributions that can be represented as a directed perfect map Set of distributions that can be represented as an undirected perfect map

  41. Directed vs. Undirected Graphs (2) A directed graph whose conditional independence properties cannot be expressed using an undirected graph over A, B, and C An undirected graph whose conditional independence properties cannot be expressed with a directed graph over A,B, C

  42. Inference in Graphical Models

  43. Inference on a Chain Naïve: will have to evaluate KN values

  44. Observation The only one that depends on Performing the sum we get a function over Which we will involve with the next potential, and so on…

  45. Inference on a Chain

  46. Inference on a Chain

  47. Inference on a Chain

  48. Inference on a Chain To compute local marginals: Compute and store all forward messages, . Compute and store all backward messages, . Compute Z at any node xm Computefor all variables required.

  49. Trees Undirected Tree Directed Tree Polytree

  50. Factor Graphs

More Related