620 likes | 634 Views
Variational Methods for Graphical Models. Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul. Presented by: Afsaneh Shirazi. Outline. Motivation Inference in graphical models Exact inference is intractable Variational methodology Sequential approach Block approach
E N D
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi
Outline • Motivation • Inference in graphical models • Exact inference is intractable • Variational methodology • Sequential approach • Block approach • Conclusions
Motivation(Example: Medical Diagnosis) diseases What is the most probable disease? symptoms
Motivation • We want to answer some queries about our data • Graphical model is a way to model data • Inference in some graphical models is intractable (NP-hard) • Variational methods simplify the inference in graphical models by using approximation
Graphical Models • Directed (Bayesian network) • Undirected P(S3|S1,S2) P(S1) S1 S4 P(S4|S3) S3 P(S2) S2 S5 P(S5|S3,S4) (C1) (C3) (C2)
Inference in Graphical Models Inference:Given a graphical model, the process of computing answers to queries • How computationally hard is this decision problem? • Theorem:Computing P(X = x) in a Bayesian network is NP-hard
Why Exact Inference is Intractable? diseases Diagnose the most probable disease symptoms
Why Exact Inference is Intractable? diseases : Observed symptoms symptoms
Why Exact Inference is Intractable? diseases 1 0 1 :Noisy-OR model symptoms
Why Exact Inference is Intractable? diseases 1 0 1 : Noisy-OR model symptoms
Why Exact Inference is Intractable? diseases : Observed symptoms symptoms
Why Exact Inference is Intractable? diseases : Observed symptoms symptoms
Reducing the Computational Complexity Simple graph for exact methods Variational Methods Approximate the probability distribution Use the role of convexity
Express a Function Variationally • is a concave function
Express a Function Variationally • is a concave function
Express a Function Variationally • If the function is not convex or concave: transform the function to a desired form • Example: logistic function Transforming back Transformation Approximation
Approaches to Variational Methods • Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process • Block Approach: (off-line) has obvious substructures
Completely transformed Graph Reintroduce one node at a time Simple Graph for exact methods Sequential Approach(Two Methods) Simple Graph for exact methods Untransformed Graph Transform one node at a time
Sequential Approach (Example) diseases Log Concave symptoms
Sequential Approach (Example) diseases Log Concave symptoms
Sequential Approach (Example) diseases 1 symptoms
Sequential Approach (Example) diseases 1 symptoms
Sequential Approach (Example) diseases 1 symptoms
Sequential Approach (Upper Bound and Lower Bound) • We need both lower bound and upper bound
How to Compute Lower Bound for a Concave Function? • Lower bound for concave functions: Variational parameter is probability distribution
Block Approach (Overview) • Off-line application of sequential approach • Identify some structure amenable to exact inference • Family of probability distribution via introduction of parameters • Choose best approximation based on evidence
Minimize KL divergence Family of Block Approach (Details) • KL divergence
Block Approach (Example – Boltzmann machine) Minimize KL Divergence si sj
Block Approach (Example – Boltzmann machine) Minimize KL Divergence si sj Mean field equations: solve for fixed point
Conclusions • Time or space complexity of exact calculation is unacceptable • Complex graphs can be probabilistically simple • Inference in simplified models provides bounds on probabilities in the original model
Concerns • Approximation accuracy • Strong dependencies can be identified • Not based on convexity transformation • Not able to assure that the framework will transfer to other examples • Not straightforward to develop a variational approximation for new architectures
Justification for KL Divergence • Best lower bound on the probability of the evidence
KL Divergence between Q(H|E) and P(H|E,) EM • Maximum likelihood parameter estimation: • Following function is the lower bound on log likelihood
Traditional EM EM • Maximize the bound with respect to Q • Fix Q, maximize with respect to Approximation to EM algorithm
DAG Junction Tree Initialization Inconsistent Junction Tree Propagation Consistent Junction Tree Marginalization Principle of Inference
X1 X2 Y1 Y2 X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Create Join Tree HMM with 2 time steps: Junction Tree:
X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Initialization
Example: Collect Evidence • Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. • Call recursively neighboring cliques for messages: • 1. Call X1,Y1. • 1. Projection: • 2. Absorption:
X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Collect Evidence (cont.) • 2. Call X2,Y2: • 1. Projection: • 2. Absorption:
Example: Distribute Evidence • Pass messages recursively to neighboring nodes • Pass message from X1,X2 to X1,Y1: • 1. Projection: • 2. Absorption:
X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Distribute Evidence (cont.) • Pass message from X1,X2 to X2,Y2: • 1. Projection: • 2. Absorption:
Example: Inference with evidence • Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation) • Assign likelihoods to the potential functions during initialization:
Example: Inference with evidence (cont.) • Repeating the same steps as in the previous case, we obtain:
Variable Elimination General idea: • Write query in the form • Iteratively • Move all irrelevant terms outside of innermost sum • Perform innermost sum, getting a new term • Insert the new term into the product