Bayesian Networks

Bayesian Networks

Some Applications of BN • Medical diagnosis • Troubleshooting of hardware/software systems • Fraud/uncollectible debt detection • Data mining • Analysis of genetic sequences • Data interpretation, computer vision, image understanding

Battery Gas Radio SparkPlugs Starts Moves More Complicated Singly-Connected Belief Net

Region = {Sky, Tree, Grass, Rock} R1 Above R2 R4 R3

Burglary Earthquake Alarm P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) JohnCalls MaryCalls  full joint distribution table Calculation of Joint Probability P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary  Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents

Probabilistic Inference • Is the following problem…. • Given: • A belief state P(X1,…,Xn) in some form (e.g., a Bayes net or a joint probability table) • A query variable indexed by q • Some subset of evidence variables indexed by e1,…,ek • Find: • P(Xq | Xe1 ,…, Xek)

Top-Down inference • Only works if the graph of ancestors of a variable is a polytree • Evidence given on ancestor(s) of the query variable • Efficient: • O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents • Evidence on an ancestor cuts off influence of portion of graph above evidence node

Cavity Toothache Querying the BN • The BN gives P(T|C) • P(C|T) can be computed using Bayes rule: • P(A|B) = P(B|A) P(A) / P(B)

Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity

Naïve Bayes Models • P(Cause,Effect1,…,Effectn)= P(Cause) Pi P(Effecti | Cause) Cause Effect1 Effect2 Effectn

P(C|F1,….,Fk) = P(C,F1,….,Fk)/P(F1,….,Fk) = 1/Z P(C)Pi P(Fi|C) Given features, what class? Naïve Bayes Classifier • P(Class,Feature1,…,Featuren)= P(Class) Pi P(Featurei | Class) Spam / Not Spam English / French/ Latin … Class Feature1 Feature2 Featuren Word occurrences

Comments on Naïve Bayes models • Very scalable (thousands or millions of features!), easy to implement • Easily handles missing data: just ignore the feature • Conditional independence of features is main weakness. What if two features were actually correlated? Many features?

Variable Elimination: Probabilistic Inference in General Networks Basic idea: Eliminate “nuisance” variables one at a time via marginalization Coherence Difficulty Intelligence Grade SAT Example: P(J) Elimination order: C,D,I,H,G,S,L Letter Job Happy

Eliminating C P(C) Coherence P(D|C) Difficulty Intelligence Grade SAT Letter Job Happy

C is Eliminated, giving a new factor over D P(D)=cP(D|C)P(C) Difficulty Intelligence Grade SAT Letter Job Happy

Eliminating D P(D) Difficulty Intelligence P(G|I,D) Grade SAT Letter Job Happy

D is Eliminated, giving a new Factor over G, I Intelligence P(G|I)=dP(G|I,d)P(d) Grade SAT Letter Job Happy

Eliminating I P(I) Intelligence P(G|I) Grade SAT P(S|I) Letter Job Happy

I is Eliminated, producing a new Fill Edge and Factor over G and S P(G,S)=iP(i)P(G|i)P(S|i) Grade SAT New undirected fill edge Letter Job Happy

Eliminating H Grade SAT Letter Job Happy P(H|G,J)

Eliminating H Grade SAT fGJ(G,J)=hP(h|G,J)=1 Letter Job Happy P(H|G,J)

H is Eliminated, producing a new Fill Edge and Factor over G, J Grade SAT fGJ(G,J) Letter Job

Eliminating G P(G,S) Grade SAT P(L|G) fGJ(G,J) Letter Job

G is Eliminated, Making a new Trinary Factor over S,L,J and a new Fill Edge P(G,S) Grade SAT P(L|G) fGJ(G,J) Letter Job fSLJ(S,L,J) = g P(g,S) P(L|g) fGJ(g,J)

Eliminating S SAT fSLJ(S,L,J) P(J|S,L) Letter Job

S is Eliminated, creating a new factor over L, J SAT fSLJ(S,L,J) P(J|S,L) Letter Job fLJ(L,J) = s fSLJ(s,L,J) P(J|s ,L)

Eliminating L Letter Job fLJ(L,J)

L is eliminated, giving a new factor over J (which turns out to be P(J)) Letter Job fLJ(L,J) P(J)=l fLJ(l,J)

L is eliminated, giving a new factor over J (which turns out to be P(J)) Job P(J)

Going through VE • SCP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fD(D)=SCP(C)P(D|C)

Going through VE • SCP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGI(G,I)=SDfD(D)P(G|I,D)

Going through VE • SC,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGI(G,I)=SDfD(D)P(G|I,D)

Going through VE • SC,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGS(G,S)=SIfGI(G,I)P(I)P(S|I)

Going through VE • SC,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGS(G,S)=SIfGI(G,I)P(I)P(S|I)

Going through VE • SC,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGJ(G,J)=SHP(H|G,J) What values does this factor store?

Going through VE • SC,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) • Apply elimination ordering C,D,I,H,G,S,L • fGJ(G,J)=SHP(H|G,J)

Going through VE • SC,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) • Apply elimination ordering C,D,I,H,G,S,L • fSLJ(S,L,J)=SGfGS(G,S)P(L|G)fGJ(G,J)

Going through VE • SC,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) • Apply elimination ordering C,D,I,H,G,S,L • fSLJ(S,L,J)=SGfGS(G,S)P(L|G)fGJ(G,J)

Going through VE • SC,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) • Apply elimination ordering C,D,I,H,G,S,L • fLJ(L,J)=SS fSLJ(S,L,J)P(J|L,S)

Going through VE • SC,D,I,H,G,SP(X) = fLJ(L,J) • Apply elimination ordering C,D,I,H,G,S,L • fLJ(L,J)=SS fSLJ(S,L,J)

Going through VE • SC,D,I,H,G,SP(X) = fLJ(L,J) • Apply elimination ordering C,D,I,H,G,S,L • fJ(J)=SL fLJ(L,J)

Going through VE • SC,D,I,H,G,S,LP(X) = fJ(J) • Apply elimination ordering C,D,I,H,G,S,L • fJ(J)=SL fLJ(L,J)

Order-Dependence

Order matters If we were to eliminate G first, we’d create a factor over D, I, L, and H (their distribution becomes coupled) Coherence Difficulty Intelligence Grade SAT Letter Job Happy

Bayesian Networks