1 / 62

Variational Methods for Graphical Models

Variational Methods for Graphical Models. Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul. Presented by: Afsaneh Shirazi. Outline. Motivation Inference in graphical models Exact inference is intractable Variational methodology Sequential approach Block approach

mclaind
Download Presentation

Variational Methods for Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi

  2. Outline • Motivation • Inference in graphical models • Exact inference is intractable • Variational methodology • Sequential approach • Block approach • Conclusions

  3. Motivation(Example: Medical Diagnosis) diseases What is the most probable disease? symptoms

  4. Motivation • We want to answer some queries about our data • Graphical model is a way to model data • Inference in some graphical models is intractable (NP-hard) • Variational methods simplify the inference in graphical models by using approximation

  5. Graphical Models • Directed (Bayesian network) • Undirected P(S3|S1,S2) P(S1) S1 S4 P(S4|S3) S3 P(S2) S2 S5 P(S5|S3,S4) (C1) (C3) (C2)

  6. Inference in Graphical Models Inference:Given a graphical model, the process of computing answers to queries • How computationally hard is this decision problem? • Theorem:Computing P(X = x) in a Bayesian network is NP-hard

  7. Why Exact Inference is Intractable? diseases Diagnose the most probable disease symptoms

  8. Why Exact Inference is Intractable? diseases : Observed symptoms symptoms

  9. Why Exact Inference is Intractable? diseases 1 0 1 :Noisy-OR model symptoms

  10. Why Exact Inference is Intractable? diseases 1 0 1 : Noisy-OR model symptoms

  11. Why Exact Inference is Intractable?

  12. Why Exact Inference is Intractable? diseases : Observed symptoms symptoms

  13. Why Exact Inference is Intractable? diseases : Observed symptoms symptoms

  14. Reducing the Computational Complexity Simple graph for exact methods Variational Methods Approximate the probability distribution Use the role of convexity

  15. Express a Function Variationally • is a concave function

  16. Express a Function Variationally • is a concave function

  17. Express a Function Variationally • If the function is not convex or concave: transform the function to a desired form • Example: logistic function Transforming back Transformation Approximation

  18. Approaches to Variational Methods • Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process • Block Approach: (off-line) has obvious substructures

  19. Completely transformed Graph Reintroduce one node at a time Simple Graph for exact methods Sequential Approach(Two Methods) Simple Graph for exact methods Untransformed Graph Transform one node at a time

  20. Sequential Approach (Example) diseases Log Concave symptoms

  21. Sequential Approach (Example) diseases Log Concave symptoms

  22. Sequential Approach (Example) diseases 1 symptoms

  23. Sequential Approach (Example) diseases 1 symptoms

  24. Sequential Approach (Example) diseases 1 symptoms

  25. Sequential Approach (Upper Bound and Lower Bound) • We need both lower bound and upper bound

  26. How to Compute Lower Bound for a Concave Function? • Lower bound for concave functions: Variational parameter is probability distribution

  27. Block Approach (Overview) • Off-line application of sequential approach • Identify some structure amenable to exact inference • Family of probability distribution via introduction of parameters • Choose best approximation based on evidence

  28. Minimize KL divergence Family of Block Approach (Details) • KL divergence

  29. Block Approach (Example – Boltzmann machine) Si Sj

  30. Block Approach (Example – Boltzmann machine) Si Sj=1

  31. Block Approach (Example – Boltzmann machine) si sj

  32. Block Approach (Example – Boltzmann machine) Minimize KL Divergence si sj

  33. Block Approach (Example – Boltzmann machine) Minimize KL Divergence si sj Mean field equations: solve for fixed point

  34. Conclusions • Time or space complexity of exact calculation is unacceptable • Complex graphs can be probabilistically simple • Inference in simplified models provides bounds on probabilities in the original model

  35. Thank You

  36. Extra Slides

  37. Concerns • Approximation accuracy • Strong dependencies can be identified • Not based on convexity transformation • Not able to assure that the framework will transfer to other examples • Not straightforward to develop a variational approximation for new architectures

  38. Justification for KL Divergence • Best lower bound on the probability of the evidence

  39. KL Divergence between Q(H|E) and P(H|E,) EM • Maximum likelihood parameter estimation: • Following function is the lower bound on log likelihood

  40. Traditional EM EM • Maximize the bound with respect to Q • Fix Q, maximize with respect to Approximation to EM algorithm

  41. DAG Junction Tree Initialization Inconsistent Junction Tree Propagation Consistent Junction Tree Marginalization Principle of Inference

  42. X1 X2 Y1 Y2 X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Create Join Tree HMM with 2 time steps: Junction Tree:

  43. X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Initialization

  44. Example: Collect Evidence • Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. • Call recursively neighboring cliques for messages: • 1. Call X1,Y1. • 1. Projection: • 2. Absorption:

  45. X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Collect Evidence (cont.) • 2. Call X2,Y2: • 1. Projection: • 2. Absorption:

  46. Example: Distribute Evidence • Pass messages recursively to neighboring nodes • Pass message from X1,X2 to X1,Y1: • 1. Projection: • 2. Absorption:

  47. X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Distribute Evidence (cont.) • Pass message from X1,X2 to X2,Y2: • 1. Projection: • 2. Absorption:

  48. Example: Inference with evidence • Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation) • Assign likelihoods to the potential functions during initialization:

  49. Example: Inference with evidence (cont.) • Repeating the same steps as in the previous case, we obtain:

  50. Variable Elimination General idea: • Write query in the form • Iteratively • Move all irrelevant terms outside of innermost sum • Perform innermost sum, getting a new term • Insert the new term into the product

More Related