500 likes | 506 Views
Learn about message passing and belief propagation algorithms for optimization and learning. Understand how to interpret VE steps as passing messages along a graph, exact and approximate inference, and cluster graphs.
E N D
CS b553: Algorithms for Optimization and Learning Message Passing / Belief Propagation
Message Passing/ Belief Propagation • Interpretation of VE steps as passing “messages” along a graph • Exact for polytrees • Arbitrary graphs -> clique trees • Loopy belief propagation • Approximate inference
Cluster Graphs • An undirected graph G • Each node contains a scope Ci X • Each factor in the BN has Scope[]Ci for some Ci • Two adjacent nodes have non-empty intersection • Running intersection property: • The set of all nodes in which Cicontains a variable X forms a connected path in G
Cluster Trees P(B) B P(A|B) A
Cluster Trees B A,B A
Message Passing Interpretation of VE B=P(B) B A,B AB=P(A|B) A A=1 Query variable
Cluster Trees & Belief Propagation Sends “message” dB->AB = B B=P(B) B A,B AB=P(A|B) A A=1
Cluster Graphs & Belief Propagation Sends “message” dB->AB = B B=P(B) B Compute dAB->A = BABdB->AB Send to A A,B AB=P(A|B) A A=1
Cluster Graphs & Belief Propagation Sends “message” dB->AB = B B=P(B) B Compute dAB->A = BABdB->AB Send to A A,B AB=P(A|B) A A=1 Compute bA = AdAB->A Done
Passing Up-Stream Query variable B=P(B) B A,B AB=P(A|B) A A=1
Passing Up-Stream B=P(B) B A,B AB=P(A|B) A A=1 Sends message dA->AB = A
Passing Up-Stream B=P(B) B Computes dAB->B = SAAB dA->AB (= 1) A,B AB=P(A|B) A A=1 Sends message dA->AB = A
Passing Up-Stream Computes bB = B dAB->B B=P(B) B Computes dAB->B = SAAB dA->AB (= 1) A,B AB=P(A|B) A A=1 Sends message dA->AB = A
Message Passing Rules in Cluster Trees • Init: • Each node Cicontains a factor i=P , where product is taken over factors assigned to Ci • Each directed edge maintains message di->j (initially nil) • Repeat while some message into the query variable is nil: • Pick a node Ci that is “ready” to send to Cj: has received messages from all neighbors except Cj • Compute and send the message di->j • ComputeMessage(i,j): • Let Si,j=CiCj • Return SCi-SijiPkAdj(i)-{j}dk->i k1 k2 k3 i di->j j
Branching Cluster Tree B E ABE A M AJ AM J
Branching Cluster Tree B E ABE A Query variable M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) ABE A M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) ABE A M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A dM->AM=M (=1) M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A dM->AM=M (=1) M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1)
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1) dA->AJ= AdABE->AdAM->A(=P(A))
Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dAJ->A=SJAJdA->AJ(=SJP(J|A) P(A) = P(J)) dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1) dA->AJ= AdABE->AdAM->A(=P(A))
Cluster Tree Calibration • Run Message Passing until no more messages to send • For all nodes: bi=iPkAdj(i)dk->i is the unconditional distribution over Ci • Much faster than running a separate query for all nodes!
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bAJ= P(J|A)P(A)*1 = P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) dAJ->A=SJAJdJ->AJ (=1) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A)*1*1 = P(A) bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) dA->ABE=1 A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAM=P(M|A)P(A)*1= P(M,A) bAJ=P(A,J)
Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) dAM->M=SAP(M|A)P(A) = P(M) bJ=P(J) dJ->AJ=1 bA=P(A) bAM=P(M,A) bAJ=P(A,J)
Incorporating Evidence? B E ABE A Evidence J M AJ AM J
Cluster Graphs & Belief Propagation • Variables send “influence” to other variables through messages • Information about CiCj divides the tree into conditionally independent pieces • Exact inference when cluster graph is a tree • All graphs can be converted into a cluster tree through VE ordering (clique tree) • Factors may be large: what about non-trees?
Cluster Graphs With Loops A AB AC B C BCD D
ClusterGraphsWith Loops A AB AC B C BCD D
Cluster Graphs With Loops Continue as if we had n-1 messages… A AB AC B C BCD D
Cluster Graphs With Loops Do it again… A AB AC B C BCD D
Cluster Graphs With Loops Now send revised messages from A given the current messages A AB AC B C BCD D
Cluster Graphs With Loops And repeat… A AB AC B C Does the product of messages into X approach P(X) as more iterations are performed? fBCD D
Loopy Belief Propagation • In many problems, yes! • Can construct problems where it doesn’t • Generalizes to other probabilistic graphical models (used in physics & material science, computer vision, sensor nets…)
X1 X2 X3 X4 X5 X6 Application to Error-Correcting Codes • Send a 3-bit message ABC through a noisy channel (say p=0.9) • 3 checksums • D = A xor B • E = B xor C • F = D xor E • Observe 6 bits X1…X6 A B C XOR XOR D E XOR F
X1 X2 X3 X4 X5 X6 Application to Error-Correcting Codes • Probability of A,B,C? • Clever checksums + clever circuits that perform loopy BP = turbo codes • Used widely in communications (3G, NASA, some Wifi standards) • Closer to Shannon limit than all prior codes! A B C XOR XOR D E XOR F