540 likes | 820 Views
Variable Elimination for Inference with Bayesian networks. Lecture 33 Ch 6.4, 6.4.1 March 28, 2012. Announcements. Assignment 4 due a week from today Only 2 late days allowed You can already do Q1, Q2 and Q3
E N D
Variable Elimination for Inference with Bayesian networks Lecture 33 Ch 6.4, 6.4.1 March 28, 2012
Announcements • Assignment 4 due a week from today • Only 2 late days allowed • You can already do Q1, Q2 and Q3 • You will be able to start looking at Q4 after today’s class, and do it after Friday’s class • Practice Exercises • Reminder: they are helpful for staying on top of the material, and for studying for the exam • Exercise 10 is on conditional independence. • Exercise 11 (NEW) is on Bayesian network construction and variable elimination. Helpful for Q3 and Q4
Lecture Overview • Recap Lecture 32 • Inference in Bayesian Networks: variable elimination (VE) • Intro to variable elimination: Factors • VE Algorithm • VE example
How to build a Bayesian network • Define a total order over the random variables: (X1, …,Xn) • If we apply the chain rule, we have P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1) • Define as parents of random variable Xi in the Belief network a minimal set of its predecessors Parents(Xi) such that • P(Xi | X1, … ,Xi-1) = P (Xi | Parents(Xi)) • Putting it all together, in a Belief network • P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents(Xi)) Predecessors of Xi in the total order defined over the variables Xi is conditionally independent from all its other predecessors given Parents(Xi) A Belief network defines a factorization over the JDP for its variables, based on existing conditional independencies among these variables
Recap: network structure • Some variable orderings yield more compact, some less compact structures • Compact ones are better • But all representations resulting from the construction process we discussed are correct • One extreme: the fully connected network is always correct but rarely the best choice. • Simply a representation of the chain rule with no simplifications for conditional independencies. • P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1) • How can a network structure be wrong? • If it misses directed edges that are required • E.g. an edge is missing below: Fire ╨ Alarm | {Tampering, Smoke} Tampering Leaving Alarm Report Smoke Fire
Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) Intercausal Mixed Diagnostic Predictive There is no fireF=f Person smokes next to sensorS=t Fire happens F=t Fire P(F|L=t)=? Fire Fire P(F|A=t,T=t)=? Alarm Smoking at Sensor Alarm Alarm Fire P(A|F=f,L=t)=? Leaving Alarm Leaving Leaving People leaving L=t People leavingL=t Alarm goes off P(a) = 1.0 P(L|F=t)=?
Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) • Remember? We can already do this • Compute any entry of the JPD given the CPTs in the network, • then do Inference by Enumeration • BUT that’s extremely inefficient - does not scale. • Variable Elimination (VE) is an algorithm to perform inference in Bayesian networks more efficiently
Lecture Overview • Recap Lecture 32 • Inference in Bayesian Networks: variable elimination (VE) • Intro to variable elimination: Factors • VE Algorithm • VE example
Inference • Y: subset of variables that is queried • E: subset of variables that are observed . E = e • Z1, …,Zkremaining variables in the JPD • We need to compute this numerator for each value of Y, yi • We need to marginalize over all the variables Z1,…Zk not involved in the query Def of conditional probability • To compute the denominator, marginalize over Y • - Same value for every P(Y=yi). Normalization constant ensuring that • All we need to compute is the numerator: joint probability of the query variable(s) • and the evidence! • Variable Elimination is an algorithm that efficiently performs this operation by • casting it as operations between factors - introduced next
Factors • A factor is a function from a tuple of random variables to the real numbers R • We write a factor on variables X1,… ,Xjas f(X1,… ,Xj) • A factor denotes one or more (possibly partial) distributions over the given tuple of variables, e.g., • P(X1, X2) is a factor f(X1, X2) • P(Z | X,Y) is a factor • f(Z,X,Y) • P(Z=f|X,Y) is a factor f(X,Y) • Note: Factors do not have to sum to one Distribution Set of Distributions One for each combination of values for X and Y f(X, Y ) Z = f Set of partial Distributions
Operation 1: assigning a variable • We can make new factors out of an existing factor • Our first operation:we can assign some or all of the variables of a factor. • What is the result of assigning X= t ? f(X=t,Y,Z) =f(X, Y, Z)X = t Factor of Y,Z
More examples of assignment f(X=t,Y,Z) Factor of Y,Z f(X=t,Y,Z=f): Number Factor of Y
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? f(A) f(B) f(A,B) A number
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we assign variable A we remove it from the factor’s domain
Operation 2: Summing out a variable • Our second operation on factors: we can marginalize out (or sum out) a variable • Exactly as before. Only difference: factors don’t sum to 1 • Marginalizing out a variable X from a factor f(X1,… ,Xn) yields a new factor defined on {X1,… ,Xn } \ {X} (Bf3)(A,C)
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? f(B) f(A) f(B) f(A,B) A number
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we assign variable A we remove it from the factor’s domain • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we marginalize out variable A we remove it from the factor’s domain
Operation 3: multiplying factors f1(A,B)× f2(B,C):
Operation 3: multiplying factors The product of factor f1(A, B) and f2(B, C), where B is the variable in common, is the factor (f1× f2)(A, B, C) defined by: Note: A, B and C can be sets of variables - the domain of f1 × f2 is
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? • If we multiply factors f4(X,Y) and f6(Z,Y), what is the correct form for the resulting factor? f(B) f(B) f(X,Z) f(X) f(X,Y) f(X,Y,Z)
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we assign variable A we remove it from the factor’s domain • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we marginalize out variable A we remove it from the factor’s domain • If we multiply factors f4(X,Y) and f6(Z,Y), what is the correct form for the resulting factor? • f(X,Y,Z) • When multiplying factors, the resulting factor’s domain is the union of the multiplicands’domains
Recap • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? • If we multiply factors f4(X,Y) and f6(Z,Y), what is the correct form for the resulting factor? • What is the correct form for B f5(A,B) × f6(B,C) • As usual, product before sum: B ( f5(A,B) × f6(B,C) ) f(B) f(B) f(X,Y,Z) f(A,B,C) f(B) f(A,C) f(B,C)
Recap: Factors and Operations on Them • If we assign variable A=a in factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we assign variable A we remove it from the factor’s domain • If we marginalize variable A out from factor f7(A,B), what is the correct form for the resulting factor? • f(B). When we marginalize out variable A we remove it from the factor’s domain • If we multiply factors f4(X,Y) and f6(Z,Y), what is the correct form for the resulting factor? • f(X,Y,Z) • When multiplying factors, the resulting factor’s domain is the union of the multiplicands’domains • What is the correct form for B f5(A,B) × f6(B,C) • As usual, product before sum: B ( f5(A,B) × f6(B,C) ) • Result of multiplication: f(A,B,C). Then marginalize out B: f’(A,C)
Lecture Overview • Recap Lecture 32 • Inference in Bayesian Networks: variable elimination (VE) • Intro to variable elimination: Factors • VE Algorithm • VE example
Remember our goal • Y: subset of variables that is queried • E: subset of variables that are observed . E = e • Z1, …,Zkremaining variables in the JPD • We need to compute this numerator for each value of Y, yi • We need to marginalize over all the variables Z1,…Zk not involved in the query Def of conditional probability • To compute the denominator, marginalize over Y • - Same value for every P(Y=yi). Normalization constant ensuring that • All we need to compute is the numerator: joint probability of the query variable(s) • and the evidence! • Variable Elimination is an algorithm that efficiently performs this operation by • casting it as operations between factors
Variable Elimination: Intro (1) • We can express the joint probability as a factor • f(Y, E1…, Ej, Z1…,Zk) • We can compute P(Y, E1=e1, …, Ej=ej) by • AssigningE1=e1, …, Ej=ej • Marginalizing out variables Z1, …, Zk, one at a time • the order in which we do this is called our elimination ordering • Are we done? observed Other variables not involved in the query No, this still represents the whole JPD (as a single factor)! Need to exploit the compactness of Bayesian networks
Variable Elimination Intro (2) • We can express the joint factor as a product of factors, one for each conditional probability Recall the JPD of a Bayesian network
Computing sums of products • Inference in Bayesian networks thus reduces to computing the sums of products • Example: it takes 9 multiplications to evaluate the expression ab+ ac + ad + aeh + afh + agh. • How can this expression be evaluated efficiently? • Factor out the a and then the hgiving a(b + c + d + h(e + f + g)) • This takes only 2 multiplications (same number of additions as above) • Similarly, how can we compute efficiently? • Factor out those terms that don't involve Zk, e.g.:
Summing out a variable efficiently • To sum out a variable Z from a product f1× … × fkof factors • Partition the factors into • Those that do not contain Z, say f1 ,.., fi • Those that contain Z, say fi+1 ,…, fk New factor f’obtained by multiplying fi+1,..,fkand then summing out Z • We know that • We thus have • Now we have summed out Z • Example
Decompose sum of products General case Factors that do not contain Z1 Factors that contain Z1 Factors that contain Z2 Factors that contain Z1 Factors that do not contain Z2 nor Z1 Etc., continue given a predefined simplification ordering of the variables: variable elimination ordering
6. Normalize by dividing the resulting factor f(Y) by The variable elimination algorith, To compute P(Y=yi| E = e) • Construct a factor for each conditional probability. • For each factor, assign the observed variables E to their observed values. • Given an elimination ordering, decompose sum of products • Sum out all variables Zinot involved in the query • Multiply the remaining factors (which only involve ) See the algorithm VE_BN in the P&M text, Section 6.4.1, Figure 6.8, p. 254.
Lecture Overview • Recap Lecture 32 • Inference in Bayesian Networks: variable elimination (VE) • Intro to variable elimination: Factors • VE Algorithm • VE example
Variable elimination example Compute P(G|H=h1). P(G,H) = A,B,C,D,E,F,IP(A,B,C,D,E,F,G,H,I) = = A,B,C,D,E,F,IP(A)P(B|A)P(C)P(D|B,C)P(E|C)P(F|D)P(G|F,E)P(H|G)P(I|G)
Step 1: Construct a factor for each cond. probability Compute P(G|H=h1). P(G,H) = A,B,C,D,E,F,IP(A)P(B|A)P(C)P(D|B,C)P(E|C)P(F|D)P(G|F,E)P(H|G)P(I|G) P(G,H) = A,B,C,D,E,F,If0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f7(H,G) f8(I,G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G)
Step 2: assign to observed variables their observed values. Compute P(G|H=h1). Previous state: P(G,H) = A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f7(H,G) f8(I,G) ObserveH : P(G,H=h1)=A,B,C,D,E,F,I f0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E) f9(G) f8(I,G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f9(G) H=h1
Step 3: Decompose sum of products Compute P(G|H=h1). Previous state: P(G,H=h1) = A,B,C,D,E,F,If0(A) f1(B,A) f2(C) f3(D,B,C) f4(E,C) f5(F, D) f6(G,F,E)f9(G)f8(I,G) Elimination ordering A, C, E, I, B, D, F: P(G,H=h1) = f9(G) F D f5(F, D) B I f8(I,G)E f6(G,F,E)C f2(C) f3(D,B,C) f4(E,C) A f0(A) f1(B,A) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f9(G)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = f9(G) F D f5(F, D) B I f8(I,G)E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C) A f0(A) f1(B,A) Eliminate A: perform product and sum out A in P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) I f8(I,G)E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C) • f10(B) does not depend • on C, E, or I, so we can • push it outside of those • sums. • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = f9(G) F D f5(F, D) B f10(B)I f8(I,G)E f6(G,F,E) C f2(C) f3(D,B,C) f4(E,C) Eliminate C: perform product and sum out C in P(G,H=h1) = f9(G) F D f5(F, D) B f10(B)I f8(I,G)E f6(G,F,E)f11(B,D,E) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B)I f8(I,G)E f6(G,F,E)f11(B,D,E) Eliminate E: perform product and sum out E in P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B) f12(B,D,F,G) I f8(I,G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = P(G,H=h1) = f9(G) F D f5(F, D) B f10(B)f12(B,D,F,G) If8(I,G) Eliminate I: perform product and sum out I in P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) B f10(B)f12(B,D,F,G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f13(G)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) B f10(B) f12(B,D,F,G) Eliminate B: perform product and sum out B in P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) f14(D,F,G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f13(G) • f14(D,F,G)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F D f5(F, D) f14(D,F,G) Eliminate D: perform product and sum out D in P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F f15(F,G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f13(G) • f14(D,F,G) • f15(F,G)
Step 4: sum out non query variables (one at a time) Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = P(G,H=h1) = f9(G) f13(G)F f15(F,G) Eliminate F: perform product and sum out F in f9(G) f13(G)f16(F,G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f13(G) • f14(D,F,G) • f15(F,G) • f16(G)
Step 5: Multiply remaining factors Compute P(G|H=h1). Elimination order: A,C,E,I,B,D,F Previous state: P(G,H=h1) = f9(G) f13(G)f16(G) Multiply remaining factors (all in G): P(G,H=h1) =f17(G) • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f17(G) • f13(G) • f14(D,F,G) • f15(F,G) • f16(G)
Step 6: Normalize Compute P(G|H=h1). • f9(G) • f0(A) • f1(B,A) • f2(C) • f3(D,B,C) • f4(E,C) • f5(F, D) • f6(G,F,E) • f7(H,G) • f8(I,G) • f10(B) • f11(B,D,E) • f12(B,D,F,G) • f17(G) • f13(G) • f14(D,F,G) • f15(F,G) • f16(G)
Learning Goals For Today’s Class • Variable elimination • Understating factors and their operations • Carry out variable elimination by using factors and the related operations • Practice Exercises • Reminder: they are helpful for staying on top of the material, and for studying for the exam • Exercise 10 is on conditional independence. • Exercise 11 is on variable elimination • Assignment 4 is due in one week • You should now be able to solve questions1, 2, 3 and start looking at question 4
Variable elimination ordering e.g., suppose we have P(G,D=t) = A,B,C,f(A,G) f(B,A) f(C,G) f(B,C) Is there only one way to simplify? No, any order of the relevant variables can be chosen. If we chose order C,B,A, we get • P(G,D=t) = Af(A,G)Bf(B,A)C f(C,G) f(B,C) If we chose order B,C,A, we get • P(G,D=t) = Af(A,G)Cf(C,G)B f(B,A) f(B,C)
Variable elimination ordering e.g., suppose we have P(G,D=t) = A,B,C,f(A,G) f(B,A) f(C,G) f(B,C) Is there only one way to simplify? No, any order of the relevant variables can be chosen. If we chose order C,B,A, we get • P(G,D=t) = Af(A,G)Bf(B,A)C f(C,G) f(B,C) If we chose order B,C,A, we get • P(G,D=t) = Af(A,G)Cf(C,G)B f(B,A) f(B,C)