840 likes | 1.03k Views
Exact Inference. Eran Segal Weizmann Institute. Course Outline. Inference. Markov networks and Bayesian networks represent a joint probability distribution. Networks contain information needed to answer any query about the distribution Inference is the process of answering such queries
E N D
Exact Inference Eran Segal Weizmann Institute
Inference • Markov networks and Bayesian networks represent a joint probability distribution • Networks contain information needed to answer any query about the distribution • Inference is the process of answering such queries • Direction between variables does not restrict queries • Inference combines evidence from all network parts
Likelihood Queries • Compute probability (=likelihood) of the evidence • Evidence: subset of variables E and an assignment e • Task: compute P(E=e) • Computation
Conditional Probability Queries • Conditional probability queries • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute P(Y | E=e) • Applications • Medical and fault diagnosis • Genetic inheritance • Computation
Maximum A Posteriori Assignment • Maximum A Posteriori Assignment (MAP) • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute MAP(Y|E=e) = argmaxy P(Y=y | E=e) • Note 1: there may be more than one possible solution • Note 2: equivalent to computing argmaxy P(Y=y , E=e) since P(Y=y | E=e) = P(Y=y , E=e) / P(E=e) • Computation
Most Probable Assignment: MPE • Most Probable Explanation (MPE) • Evidence: subset of variables E and an assignment e • Query: all other variables Y (Y=U-E) • Task: compute MPE(Y|E=e) = argmaxy P(Y=y | E=e) • Note: there may be more than one possible solution • Applications • Decoding messages: find the most likely transmitted bits • Diagnosis: find a single most likely consistent hypothesis
Most Probable Assignment: MPE • Note: We are searching for the most likely joint assignment to all variables • May be different than most likely assignment to each RV A • P(a1)>P(a0) MAP(A) = a1 • MPE(A,B) = {a0, b1} • P(a0, b0) = 0.04 • P(a0, b1) = 0.36 • P(a1, b0) = 0.3 • P(a1, b1) = 0.3 B P(A) P(B|A)
Exact Inference in Graphical Models • Graphical models can be used to answer • Conditional probability queries • MAP queries • MPE queries • Naïve approach • Generate joint distribution • Depending on query, compute sum/max • Exponential blowup • Exploit independencies for efficient inference
Complexity of Bayesnet Inference • Assume encoding specifies DAG structure • Assume CPD representation as table CPD • Decision problem: Given a network G, a variable X and a value xVal(X), decide whether PG(X=x)>0
Complexity of Bayesnet Inference • Theorem: Decision problem is NP-complete • Proof: • Decision problem is in NP: for an assignment e to all network variables, check whether X=x in e and P(e)>0 • Reduction from 3-SAT • Binary valued variables Q1,...,Qn • Clauses C1,...,Ck where Ci=Li,1Li,2Li,3 • Li,j for i=1,...k and j=1,2,3 is a literal which is Qi orQi • = C1,...,Ck • Decision problem: is there an assignment to Q1,...,Qn satisfying ? • Construct network such that P(X=1)>0 iff satisfiable
Complexity of Bayesnet Inference ... • P(Qi=1)=0.5 • P(Ci=1 | Pa(Ci)) = Pa(Ci) • CPD of A1,...,Ak-2,X is a deterministic AND • P(X=1|q1,...qn)=1 iff q1,...qn satisfies • P(X=1)>0 iff there is a satisfying assignment Q1 Q2 Q3 Qn ... C1 C2 C3 Ck ... ... A1 A2 X
Complexity of Bayesnet Inference • Easy to check • Polynomial number of variables • CPDs can be described by a small table (max 8 parameters) • P(X = 1) > 0 if and only if there exists a satisfying assignment to Q1,…,Qn • Conclusion: polynomial reduction of 3-SAT • Implications • Cannot find a general efficient procedure for all networks • Can find provably efficient procedures for particular families of networks • Exploit network structure and independencies • Dynamic programming
Approximate Inference • Rather than computing the exact answer, compute an approximation to it • Approximation metrics for computing P(y|e) • Estimate p has absolute error if |P(y|e)-p| • Estimate p has relative error if p / (1+) P(y|e) p(1+) • Absolute error is not very useful in probability distributions since often probabilities are small
Approximate Inference Complexity • Theorem: the following is NP-Hard • Given a network G over n variables, a variable X and a value xVal(X), find a number p that has relative error (n) for the query PG(X=x) • Proof: • Based on the hardness result for exact inference • We showed that computing PG(X=x)>0 is hard • An algorithm that returns an estimate p to the original query would return p>0 iff PG(X=x)>0 • The approximate inference with relative error is asNP-hard as the original exact inference problem
Approximate Inference Complexity • Theorem: the following is NP-Hard for <0.5 • Given a network G over n variables, a variable X and a value xVal(X), and observation eVal(E) for variables E find a number p that has absolute error for PG(X=x|E=e) • Proof: • Consider the same construction for the network as above • Strategy of proof: show that given an approximation as above, we can determine satisfiability in polynomial time
Approximate Inference Complexity • Proof cont.: • Construction • Use approximate algorithm to compute the query P(Q1|X=1) • Assign Q1 to the value q which has higher posterior probability • Generate new network without Q1 and with modified CPDs • Repeat this process for all Qi • Claim: Assignment generated in the process satisfies iff has a satisfiable assignment • Proving the claim • Easy case: if does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy • Harder case: if has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q
Approximate Inference Complexity • Proof cont.: • Proving the claim • Easy case: if does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy • Harder case: if has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q • If is satisfiable with both q and q then done • If is not satisfiable with Q1=q, then P(Q1=q|X=1)=0 but then we have P(Q1=q|X=1)=1, and then since our approximation has absolute error <0.5, we will necessarily choose q which has a satisfying assignment • By induction on all Q variables, we have that the assignment we find must satisfy • Construction process is polynomial
Inference Complexity Summary • NP-Hard • Exact inference • Approximate inference • with relative error • with absolute error < 0.5 (given evidence) • Hopeless? • No, we will see many network structures that have provably efficient algorithms and we will see cases when approximate inference works efficiently with high accuracy
Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) X1 X2 X3 All the numbers for this computation are in the CPDs of the original Bayesian network O(|X1||X2|) operations
Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) • Computing P(X3) X1 X2 X3 • P(X3|X2) is a given CPD • P(X2) was computed above • O(|X1||X2|+|X2||X3|) operations
Exact Inference Variable Elimination ... • Inference in a general chain • Computing P(Xn) • Compute each P(Xi+1) from P(Xi) • k2 operations for each computation (assuming |Xi|=k) • O(nk2) operations for the inference • Compare to kn operations required in summing over all possible entries in the joint distribution over X1,...Xn • Inference in a general chain can be done in linear time! X1 X2 X3 Xn
Exact Inference Variable Elimination X1 X2 X3 X4 Pushing summations = Dynamic programming
Inference With a Loop X1 • Computing P(X4) X2 X3 X4
Efficient Inference in Bayesnets • Properties that allow us to avoid exponential blowup in the joint distribution • Bayesian network structure – some subexpressions depend on a small number of variables • Computing these subexpressions and caching the results avoids generating them exponentially many times
Variable Elimination • Inference algorithm defined in terms of factors • A factor is a function from value assignments of a set of random variables D to real positive numbers + • The set of variables D is the scope of the factor • Factors generalize the notion of CPDs • Thus, the algorithm we describe applies both to Bayesian networks and Markov networks
Variable Elimination: Factors • Let X, Y, Z be three sets of disjoint sets of RVs, and let 1(X,Y) and 2(Y,Z) be two factors • We define the factor product 1x2 operation to be a factor :Val(X,Y,Z) as (X,Y,Z)=1(X,Y)2(Y,Z)
Variable Elimination: Factors • Let X be a set of RVs, YX a RV, and (X,Y) a factor • We define the factor marginalization of Y in X to be a factor :Val(X) as (X)=Y(X,Y) • Also called summing out • In a Bayesian network, summing out all variables = 1 • In a Markov network, summing out all variables is the partition function
Variable Elimination: Factors • Factors are commutative • 1x2 = 1x2 • XY(X,Y) = YX(X,Y) • Products are associative • (1x2)x3 = 1x(2x3) • If XScope[1] (we used this in elimination above) • X1x2= 1xX2
Inference in Chain by Factors X1 X2 X3 X4 Scope of X3 and X4 does not contain X1 Scope of X4 does not contain X2
Sum-Product Inference • Let Y be the query RVs and Z be all other RVs • The general inference task is • Effective computation • Since scope factors is limited, “push in” some of the summations and perform them over the product of only a subset of factors
Sum-Product Variable Elimination • Algorithm • Sum out the variables one at a time • When summing out a variable multiply all the factors that mention the variable, generating a product factor • Sum out the variable from the combined factor, generating a new factor without the variable
Sum-Product Variable Elimination • Theorem • Let X be a set of RVs • Let F be a set of factors such that for each Scope[]X • Let YX be a set of query RVs • Let Z=X-Y • For any ordering over Z, the above algorithm returns a factor (Y) such that • Instantiation for Bayesian network query PG(Y) • F consists of all CPDs in G • Each Xi = P(Xi | Pa(Xi)) • Apply variable elimination for U-Y
A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: I,H,G,S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: H,G,S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: G,S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: S,L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: L • Compute: I D D G S L J H
A More Complex Network C • Goal: P(J) • Eliminate: G,I,S,L,H,C,D I D D G S L J H Note: intermediate factor large: f1(I,D,L,J,H)
Inference With Evidence • Let Y be the query RVs • Let E be the evidence RVs and e their assignment • Let Z be all other RVs (U-Y-E) • The general inference task is
Inference With Evidence C • Goal: P(J|H=h,I=i) • Eliminate: C,D,G,S,L • Below, compute f(J,H=h,I=i) I D D G S L J H
Complexity of Variable Elimination • Variable elimination consists of • Generating the factors that will be summed out • Summing out • Generating the factor fi=1 x,...x ki • Let Xi be the scope of fi • Each entry requires ki multiplications to generate • Generating factor fi is O(ki|Val(Xi)|) • Summing out • Addition operations, at most |Val(Xi)| • Per factor: O(kN) where N=maxi|Val(Xi)|, k=maxiki
Complexity of Variable Elimination • Start with n factors (n=number of variables) • Generate exactly one factor at each iteration • there are at most 2n factors • Generating factors • At most i|Val(Xi)|ki Niki N2n (since each factor is multiplied in exactly once and there are 2n factors) • Summing out • i|Val(Xi)| Nn (since we have n summing outs to do) • Total work is linear in N and n • Exponential blowup can be in Ni which for factor i can be vm if factor i has m variables with v values each
VE as Graph Transformation • At each step we are computing • Plot a graph where there is an undirected edge X—Y if variables X and Y appear in the same factor • Note: this is the Markov network of the probability on the variables that were not eliminated yet
VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H
VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H
VE as Graph Transformation C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D G S L J H