730 likes | 979 Views
Inference in Bayesian Networks. Agenda. Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination Monte-Carlo methods. Some Applications of BN. Medical diagnosis Troubleshooting of hardware/software systems
E N D
Agenda • Reading off independence assumptions • Efficient inference in Bayesian Networks • Top-down inference • Variable elimination • Monte-Carlo methods
Some Applications of BN • Medical diagnosis • Troubleshooting of hardware/software systems • Fraud/uncollectible debt detection • Data mining • Analysis of genetic sequences • Data interpretation, computer vision, image understanding
Battery Gas Radio SparkPlugs Starts Moves More Complicated Singly-Connected Belief Net
Region = {Sky, Tree, Grass, Rock} R1 Above R2 R4 R3
Burglary Earthquake causes Alarm effects JohnCalls MaryCalls BN from Last Lecture Intuitive meaning of arc from x to y: “x has direct influence on y” Directed acyclic graph
Arcs do not necessarily encode causality! A C B B C A 2 BN’s that can encode the same joint probability distribution
Reading off independence relationships • Given B, does the value of A affect the probability of C? • P(C|B,A) = P(C|B)? • No! • C parent’s (B) are given, and so it is independent of its non-descendents (A) • Independence is symmetric:C A | B => A C | B A B C
Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents
Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why?
Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why? • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 • P(B|A)P(E|A) = 0.086
Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | JohnCalls? • No! Why? • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent
Independence relationships • Rough intuition (this holds for tree-like graphs, polytrees): • Evidence on the (directed) road between two variables makes them independent • Evidence on an “A” node makes descendants independent • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) • Formal property in general case : D-separation independence (see R&N)
Benefits of Sparse Models • Modeling • Fewer relationships need to be encoded (either through understanding or statistics) • Large networks can be built up from smaller ones • Intuition • Dependencies/independencies between variables can be inferred through network structures • Tractable inference
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm) P(Alarm) = Σb,eP(A,b,e) P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(Alarm) = Σb,eP(A,b,e) • P(Alarm) = Σb,e P(A|b,e)P(b)P(e) • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(A) = Σb,eP(A,b,e) • P(A) = Σb,e P(A|b,e)P(b)P(e) • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E) • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A) P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence Suppose we want to compute P(Alarm|Earthquake)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence Suppose we want to compute P(A|e) P(A|e) = Σb P(A,b|e) P(A|e) = Σb P(A|b,e)P(b)
Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence • Suppose we want to compute P(A|e) • P(A|e) = Σb P(A,b|e) • P(A|e) = Σb P(A|b,e)P(b) • P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066
Top-Down inference • Only works if the graph of ancestors of a variable is a polytree • Evidence given on ancestor(s) of the query variable • Efficient: • O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents • Evidence on an ancestor cuts off influence of portion of graph above evidence node
Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)?
Bayes’ Rule • P(AB) = P(A|B) P(B) = P(B|A) P(A) • So… P(A|B) = P(B|A) P(A) / P(B)
Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)?
Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)? • P(B) = Sa P(B,A=a) [marginalization] • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability] • So, P(B) = SaP(B | A=a) P(A=a)
Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)?
Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)? • P(A|B) = P(B|A)P(A)/P(B) [Bayes rule] • P(B) = SaP(B | A=a) P(A=a) [Last slide] • So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) =
How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a?
How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a? NO!
How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)] Be careful about indices!
Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity
Performing Inference • Variables X • Have evidence set E=e, query variable Q • Want to compute the posterior probability distribution over Q, given E=e • Let the non-evidence variables be Y (= X \ E) • Straight forward method: • Compute joint P(YE=e) • Marginalize to get P(Q,E=e) • Divide by P(E=e) to get P(Q|E=e)
Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|M) = ?? Evidence E=e Query Q
Burglary Earthquake Alarm P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) JohnCalls MaryCalls full joint distribution table Inference in the Alarm Example P(J|MaryCalls) = ?? 24 entries 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 2 entries:one for JohnCalls,the other for JohnCalls 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(SjP(j,MaryCalls))
How expensive? • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi)) Straightforward method: • Use above to compute P(Y,E=e) • P(Q,E=e) = Sy1 … Syk P(Y,E=e) • P(E=e) = Sq P(Q,E=e) • Step 1: O( 2n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?
Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)
Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) Rearrange equation…
Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 Cache P(x2) for both values of X3!
Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 • How many * and + saved? • *: 2*4*2=16 vs 4+4=8 • + 2*3=8 vs 2+1=3 Can lead to huge gains in larger networks
VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b) Compute for all values of E,b