1 / 83

Exact Inference

Exact Inference. Eran Segal Weizmann Institute. Course Outline. Inference. Markov networks and Bayesian networks represent a joint probability distribution. Networks contain information needed to answer any query about the distribution Inference is the process of answering such queries

aaron
Download Presentation

Exact Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact Inference Eran Segal Weizmann Institute

  2. Course Outline

  3. Inference • Markov networks and Bayesian networks represent a joint probability distribution • Networks contain information needed to answer any query about the distribution • Inference is the process of answering such queries • Direction between variables does not restrict queries • Inference combines evidence from all network parts

  4. Likelihood Queries • Compute probability (=likelihood) of the evidence • Evidence: subset of variables E and an assignment e • Task: compute P(E=e) • Computation

  5. Conditional Probability Queries • Conditional probability queries • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute P(Y | E=e) • Applications • Medical and fault diagnosis • Genetic inheritance • Computation

  6. Maximum A Posteriori Assignment • Maximum A Posteriori Assignment (MAP) • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute MAP(Y|E=e) = argmaxy P(Y=y | E=e) • Note 1: there may be more than one possible solution • Note 2: equivalent to computing argmaxy P(Y=y , E=e) since P(Y=y | E=e) = P(Y=y , E=e) / P(E=e) • Computation

  7. Most Probable Assignment: MPE • Most Probable Explanation (MPE) • Evidence: subset of variables E and an assignment e • Query: all other variables Y (Y=U-E) • Task: compute MPE(Y|E=e) = argmaxy P(Y=y | E=e) • Note: there may be more than one possible solution • Applications • Decoding messages: find the most likely transmitted bits • Diagnosis: find a single most likely consistent hypothesis

  8. Most Probable Assignment: MPE • Note: We are searching for the most likely joint assignment to all variables • May be different than most likely assignment to each RV A • P(a1)>P(a0)  MAP(A) = a1 • MPE(A,B) = {a0, b1} • P(a0, b0) = 0.04 • P(a0, b1) = 0.36 • P(a1, b0) = 0.3 • P(a1, b1) = 0.3 B P(A) P(B|A)

  9. Exact Inference in Graphical Models • Graphical models can be used to answer • Conditional probability queries • MAP queries • MPE queries • Naïve approach • Generate joint distribution • Depending on query, compute sum/max •  Exponential blowup • Exploit independencies for efficient inference

  10. Complexity of Bayesnet Inference • Assume encoding specifies DAG structure • Assume CPD representation as table CPD • Decision problem: Given a network G, a variable X and a value xVal(X), decide whether PG(X=x)>0

  11. Complexity of Bayesnet Inference • Theorem: Decision problem is NP-complete • Proof: • Decision problem is in NP: for an assignment e to all network variables, check whether X=x in e and P(e)>0 • Reduction from 3-SAT • Binary valued variables Q1,...,Qn • Clauses C1,...,Ck where Ci=Li,1Li,2Li,3 • Li,j for i=1,...k and j=1,2,3 is a literal which is Qi orQi • = C1,...,Ck • Decision problem: is there an assignment to Q1,...,Qn satisfying ? • Construct network such that P(X=1)>0 iff  satisfiable

  12. Complexity of Bayesnet Inference ... • P(Qi=1)=0.5 • P(Ci=1 | Pa(Ci)) = Pa(Ci) • CPD of A1,...,Ak-2,X is a deterministic AND • P(X=1|q1,...qn)=1 iff q1,...qn satisfies  • P(X=1)>0 iff there is a satisfying assignment Q1 Q2 Q3 Qn ... C1 C2 C3 Ck ... ... A1 A2 X

  13. Complexity of Bayesnet Inference • Easy to check • Polynomial number of variables • CPDs can be described by a small table (max 8 parameters) • P(X = 1) > 0 if and only if there exists a satisfying assignment to Q1,…,Qn • Conclusion: polynomial reduction of 3-SAT • Implications • Cannot find a general efficient procedure for all networks • Can find provably efficient procedures for particular families of networks • Exploit network structure and independencies • Dynamic programming

  14. Approximate Inference • Rather than computing the exact answer, compute an approximation to it • Approximation metrics for computing P(y|e) • Estimate p has absolute error if |P(y|e)-p|  • Estimate p has relative error if p / (1+)  P(y|e) p(1+) • Absolute error is not very useful in probability distributions since often probabilities are small

  15. Approximate Inference Complexity • Theorem: the following is NP-Hard • Given a network G over n variables, a variable X and a value xVal(X), find a number p that has relative error (n) for the query PG(X=x) • Proof: • Based on the hardness result for exact inference • We showed that computing PG(X=x)>0 is hard •  An algorithm that returns an estimate p to the original query would return p>0 iff PG(X=x)>0 •  The approximate inference with relative error is asNP-hard as the original exact inference problem

  16. Approximate Inference Complexity • Theorem: the following is NP-Hard for <0.5 • Given a network G over n variables, a variable X and a value xVal(X), and observation eVal(E) for variables E find a number p that has absolute error  for PG(X=x|E=e) • Proof: • Consider the same construction for the network as above • Strategy of proof: show that given an approximation as above, we can determine satisfiability in polynomial time

  17. Approximate Inference Complexity • Proof cont.: • Construction • Use approximate algorithm to compute the query P(Q1|X=1) • Assign Q1 to the value q which has higher posterior probability • Generate new network without Q1 and with modified CPDs • Repeat this process for all Qi • Claim: Assignment generated in the process satisfies  iff  has a satisfiable assignment • Proving the claim • Easy case: if  does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy  • Harder case: if  has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q

  18. Approximate Inference Complexity • Proof cont.: • Proving the claim • Easy case: if  does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy  • Harder case: if  has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q • If  is satisfiable with both q and q then done • If  is not satisfiable with Q1=q, then P(Q1=q|X=1)=0 but then we have P(Q1=q|X=1)=1, and then since our approximation has absolute error <0.5, we will necessarily choose q which has a satisfying assignment • By induction on all Q variables, we have that the assignment we find must satisfy  • Construction process is polynomial

  19. Inference Complexity Summary • NP-Hard • Exact inference • Approximate inference • with relative error • with absolute error < 0.5 (given evidence) • Hopeless? • No, we will see many network structures that have provably efficient algorithms and we will see cases when approximate inference works efficiently with high accuracy

  20. Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) X1 X2 X3 All the numbers for this computation are in the CPDs of the original Bayesian network O(|X1||X2|) operations

  21. Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) • Computing P(X3) X1 X2 X3 • P(X3|X2) is a given CPD • P(X2) was computed above • O(|X1||X2|+|X2||X3|) operations

  22. Exact Inference Variable Elimination ... • Inference in a general chain • Computing P(Xn) • Compute each P(Xi+1) from P(Xi) • k2 operations for each computation (assuming |Xi|=k) • O(nk2) operations for the inference • Compare to kn operations required in summing over all possible entries in the joint distribution over X1,...Xn • Inference in a general chain can be done in linear time! X1 X2 X3 Xn

  23. Exact Inference Variable Elimination X1 X2 X3 X4 Pushing summations = Dynamic programming

  24. Inference With a Loop X1 • Computing P(X4) X2 X3 X4

  25. Efficient Inference in Bayesnets • Properties that allow us to avoid exponential blowup in the joint distribution • Bayesian network structure – some subexpressions depend on a small number of variables • Computing these subexpressions and caching the results avoids generating them exponentially many times

  26. Variable Elimination • Inference algorithm defined in terms of factors • A factor is a function from value assignments of a set of random variables D to real positive numbers + • The set of variables D is the scope of the factor • Factors generalize the notion of CPDs • Thus, the algorithm we describe applies both to Bayesian networks and Markov networks

  27. Variable Elimination: Factors • Let X, Y, Z be three sets of disjoint sets of RVs, and let 1(X,Y) and 2(Y,Z) be two factors • We define the factor product 1x2 operation to be a factor :Val(X,Y,Z)   as (X,Y,Z)=1(X,Y)2(Y,Z)

  28. Variable Elimination: Factors • Let X be a set of RVs, YX a RV, and (X,Y) a factor • We define the factor marginalization of Y in X to be a factor :Val(X)   as (X)=Y(X,Y) • Also called summing out • In a Bayesian network, summing out all variables = 1 • In a Markov network, summing out all variables is the partition function

  29. Variable Elimination: Factors • Factors are commutative • 1x2 = 1x2 • XY(X,Y) = YX(X,Y) • Products are associative • (1x2)x3 = 1x(2x3) • If XScope[1] (we used this in elimination above) • X1x2= 1xX2

  30. Inference in Chain by Factors X1 X2 X3 X4 Scope of X3 and X4 does not contain X1 Scope of X4 does not contain X2

  31. Sum-Product Inference • Let Y be the query RVs and Z be all other RVs • The general inference task is • Effective computation • Since scope factors is limited, “push in” some of the summations and perform them over the product of only a subset of factors

  32. Sum-Product Variable Elimination • Algorithm • Sum out the variables one at a time • When summing out a variable multiply all the factors that mention the variable, generating a product factor • Sum out the variable from the combined factor, generating a new factor without the variable

  33. Sum-Product Variable Elimination • Theorem • Let X be a set of RVs • Let F be a set of factors such that for each  Scope[]X • Let YX be a set of query RVs • Let Z=X-Y •  For any ordering over Z, the above algorithm returns a factor (Y) such that • Instantiation for Bayesian network query PG(Y) • F consists of all CPDs in G • Each Xi = P(Xi | Pa(Xi)) • Apply variable elimination for U-Y

  34. A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H

  35. A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H

  36. A More Complex Network C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D D G S L J H

  37. A More Complex Network C • Goal: P(J) • Eliminate: I,H,G,S,L • Compute: I D D G S L J H

  38. A More Complex Network C • Goal: P(J) • Eliminate: H,G,S,L • Compute: I D D G S L J H

  39. A More Complex Network C • Goal: P(J) • Eliminate: G,S,L • Compute: I D D G S L J H

  40. A More Complex Network C • Goal: P(J) • Eliminate: S,L • Compute: I D D G S L J H

  41. A More Complex Network C • Goal: P(J) • Eliminate: L • Compute: I D D G S L J H

  42. A More Complex Network C • Goal: P(J) • Eliminate: G,I,S,L,H,C,D I D D G S L J H Note: intermediate factor large: f1(I,D,L,J,H)

  43. Inference With Evidence • Let Y be the query RVs • Let E be the evidence RVs and e their assignment • Let Z be all other RVs (U-Y-E) • The general inference task is

  44. Inference With Evidence C • Goal: P(J|H=h,I=i) • Eliminate: C,D,G,S,L • Below, compute f(J,H=h,I=i) I D D G S L J H

  45. Complexity of Variable Elimination • Variable elimination consists of • Generating the factors that will be summed out • Summing out • Generating the factor fi=1 x,...x ki • Let Xi be the scope of fi • Each entry requires ki multiplications to generate •  Generating factor fi is O(ki|Val(Xi)|) • Summing out • Addition operations, at most |Val(Xi)| • Per factor: O(kN) where N=maxi|Val(Xi)|, k=maxiki

  46. Complexity of Variable Elimination • Start with n factors (n=number of variables) • Generate exactly one factor at each iteration •  there are at most 2n factors • Generating factors • At most i|Val(Xi)|ki Niki  N2n (since each factor is multiplied in exactly once and there are 2n factors) • Summing out • i|Val(Xi)| Nn (since we have n summing outs to do) • Total work is linear in N and n • Exponential blowup can be in Ni which for factor i can be vm if factor i has m variables with v values each

  47. VE as Graph Transformation • At each step we are computing • Plot a graph where there is an undirected edge X—Y if variables X and Y appear in the same factor • Note: this is the Markov network of the probability on the variables that were not eliminated yet

  48. VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H

  49. VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H

  50. VE as Graph Transformation C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D G S L J H

More Related