Inference in Bayesian Networks

Inference in Bayesian Networks

Agenda • Efficient inference in Bayesian Networks • Reading off independence declarations • Variable elimination • Monte-Carlo methods

Burglary Earthquake causes Alarm effects JohnCalls MaryCalls BN from Last Lecture Intuitive meaning of arc from x to y: “x has direct influence on y” Directed acyclic graph

Burglary Earthquake Alarm JohnCalls MaryCalls BN from Last Lecture Size of the CPT for a node with k parents: 2k 10 probabilities, instead of 31

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm) P(Alarm) = Σb,eP(A,b,e) P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(Alarm) = Σb,eP(A,b,e) • P(Alarm) = Σb,e P(A|b,e)P(b)P(e) • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(A) = Σb,eP(A,b,e) • P(A) = Σb,e P(A|b,e)P(b)P(e) • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E) • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A) P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)?

Bayes’ Rule • P(AB) = P(A|B) P(B) = P(B|A) P(A) • So… P(A|B) = P(B|A) P(A) / P(B) • A convenient way to manipulate probability equations

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)?

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)? • P(B) = Sa P(B,A=a) [marginalization] • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability] • So, P(B) = SaP(B | A=a) P(A=a)

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)?

How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) =

Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity

Arcs do not necessarily encode causality! A C B B C A 2 BN’s that encode the same joint probability distribution

Reading off independence relationships • Given B, does the value of A affect the probability of C? • P(C|B,A) = P(C|B)? • No! • C parent’s (B) are given, and so it is independent of its non-descendents (A) • Independence is symmetric:C  A | B => A  C | B A B C

Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary  Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why?

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | Alarm ? • No! Why? • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 • P(B|A)P(E|A) = 0.086

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | JohnCalls? • No! Why? • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent

Independence relationships • Rough intuition (this holds for tree-like graphs, polytrees): • Evidence on the (directed) road between two variables makes them independent • Evidence on an “A” node makes descendants independent • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) • Formal property in general case : D-separation  independence (see R&N)

Performing Inference • Variables X • Have evidence set E=e, query variable Q • Want to compute the posterior probability distribution over Q, given E=e • Let the non-evidence variables be Y (= X \ E) • Straight forward method: • Compute joint P(YE=e) • Marginalize to get P(Q,E=e) • Divide by P(E=e) to get P(Q|E=e)

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|M) = ?? Evidence E=e Query Q

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 2 entries:one for JohnCalls,the other for JohnCalls 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(SjP(j,MaryCalls))

How expensive? • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi)) Straightforward method: • Use above to compute P(Y,E=e) • P(Q,E=e) = Sy1 … Syk P(Y,E=e) • P(E=e) = Sq P(Q,E=e) • Step 1: O( 2n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?

Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)

VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

What order to perform VE? • For tree-like BNs (polytrees), order so parents come before children • # of variables in each intermediate probability table is 2^(# of parents of a node) • If the number of parents of a node is bounded, then VE is linear time! • Other networks: intermediate factors may become large

Non-polytree networks • P(D) = ΣaΣbΣc P(A)P(B|A)P(C|A)P(D|B,C) = ΣbΣc P(D|B,C) Σa P(A)P(B|A)P(C|A) A No more simplifications… B C D

Approximate Inference Techniques • Based on the idea of Monte Carlo simulation • Basic idea: • To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed • Conditional simulation: • To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can: • Pick a coin C out of B (occurs with probability P(C)) • Flip C and observe whether it flips heads (occurs with probability P(H|C)) • Put C back and repeat from step 1 many times • Return the fraction of heads observed (estimate of P(H))

Burglary Earthquake Alarm JohnCalls MaryCalls Approximate Inference: Monte-Carlo Simulation • Sample from the joint distribution B=0 E=0 A=0 J=1 M=0

Approximate Inference: Monte-Carlo Simulation • As more samples are generated, the distribution of the samples approaches the joint distribution! B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0

Approximate Inference: Monte-Carlo Simulation • Inference: given evidence E=e (e.g., J=1) • Remove the samples that conflict B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 Distribution of remaining samples approximates the conditional distribution!

How many samples? • Error of estimate, for n samples, is on average • Variance-reduction techniques

Inference in Bayesian Networks