Inference in Bayesian Networks

Inference in Bayesian Networks

Agenda • Reading off independence assumptions • Efficient inference in Bayesian Networks • Top-down inference • Variable elimination • Monte-Carlo methods

Some Applications of BN • Medical diagnosis • Troubleshooting of hardware/software systems • Fraud/uncollectible debt detection • Data mining • Analysis of genetic sequences • Data interpretation, computer vision, image understanding

Battery Gas Radio SparkPlugs Starts Moves More Complicated Singly-Connected Belief Net

Region = {Sky, Tree, Grass, Rock} R1 Above R2 R4 R3

BN to evaluate insurance risks

Burglary Earthquake causes Alarm effects JohnCalls MaryCalls BN from Last Lecture Intuitive meaning of arc from x to y: “x has direct influence on y” Directed acyclic graph

Arcs do not necessarily encode causality! A C B B C A 2 BN’s that can encode the same joint probability distribution

Reading off independence relationships • Given B, does the value of A affect the probability of C? • P(C|B,A) = P(C|B)? • No! • C parent’s (B) are given, and so it is independent of its non-descendents (A) • Independence is symmetric:C  A | B => A  C | B A B C

Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary  Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why?

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | Alarm ? • No! Why? • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 • P(B|A)P(E|A) = 0.086

Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | JohnCalls? • No! Why? • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent

Independence relationships • Rough intuition (this holds for tree-like graphs, polytrees): • Evidence on the (directed) road between two variables makes them independent • Evidence on an “A” node makes descendants independent • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) • Formal property in general case : D-separation  independence (see R&N)

Benefits of Sparse Models • Modeling • Fewer relationships need to be encoded (either through understanding or statistics) • Large networks can be built up from smaller ones • Intuition • Dependencies/independencies between variables can be inferred through network structures • Tractable inference

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm) P(Alarm) = Σb,eP(A,b,e) P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(Alarm) = Σb,eP(A,b,e) • P(Alarm) = Σb,e P(A|b,e)P(b)P(e) • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(A) = Σb,eP(A,b,e) • P(A) = Σb,e P(A|b,e)P(b)P(e) • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E) • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A) P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence Suppose we want to compute P(Alarm|Earthquake)

Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence • Suppose we want to compute P(A|e) • P(A|e) = Σb P(A,b|e) • P(A|e) = Σb P(A|b,e)P(b) • P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066

Top-Down inference • Only works if the graph of ancestors of a variable is a polytree • Evidence given on ancestor(s) of the query variable • Efficient: • O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents • Evidence on an ancestor cuts off influence of portion of graph above evidence node

Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)?

Bayes’ Rule • P(AB) = P(A|B) P(B) = P(B|A) P(A) • So… P(A|B) = P(B|A) P(A) / P(B)

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)?

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)? • P(B) = Sa P(B,A=a) [marginalization] • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability] • So, P(B) = SaP(B | A=a) P(A=a)

Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)?

How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) =

Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity

Performing Inference • Variables X • Have evidence set E=e, query variable Q • Want to compute the posterior probability distribution over Q, given E=e • Let the non-evidence variables be Y (= X \ E) • Straight forward method: • Compute joint P(YE=e) • Marginalize to get P(Q,E=e) • Divide by P(E=e) to get P(Q|E=e)

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|M) = ?? Evidence E=e Query Q

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 2 entries:one for JohnCalls,the other for JohnCalls 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(SjP(j,MaryCalls))

How expensive? • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi)) Straightforward method: • Use above to compute P(Y,E=e) • P(Q,E=e) = Sy1 … Syk P(Y,E=e) • P(E=e) = Sq P(Q,E=e) • Step 1: O( 2n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?

Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)

VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

Inference in Bayesian Networks