1 / 50

CS b553 : A lgorithms for Optimization and Learning

Learn about message passing and belief propagation algorithms for optimization and learning. Understand how to interpret VE steps as passing messages along a graph, exact and approximate inference, and cluster graphs.

ljenkins
Download Presentation

CS b553 : A lgorithms for Optimization and Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS b553: Algorithms for Optimization and Learning Message Passing / Belief Propagation

  2. Message Passing/ Belief Propagation • Interpretation of VE steps as passing “messages” along a graph • Exact for polytrees • Arbitrary graphs -> clique trees • Loopy belief propagation • Approximate inference

  3. Cluster Graphs • An undirected graph G • Each node contains a scope Ci X • Each factor  in the BN has Scope[]Ci for some Ci • Two adjacent nodes have non-empty intersection • Running intersection property: • The set of all nodes in which Cicontains a variable X forms a connected path in G

  4. Cluster Trees P(B) B P(A|B) A

  5. Cluster Trees B A,B A

  6. Message Passing Interpretation of VE B=P(B) B A,B AB=P(A|B) A A=1 Query variable

  7. Cluster Trees & Belief Propagation Sends “message” dB->AB = B B=P(B) B A,B AB=P(A|B) A A=1

  8. Cluster Graphs & Belief Propagation Sends “message” dB->AB = B B=P(B) B Compute dAB->A = BABdB->AB Send to A A,B AB=P(A|B) A A=1

  9. Cluster Graphs & Belief Propagation Sends “message” dB->AB = B B=P(B) B Compute dAB->A = BABdB->AB Send to A A,B AB=P(A|B) A A=1 Compute bA = AdAB->A Done

  10. Passing Up-Stream Query variable B=P(B) B A,B AB=P(A|B) A A=1

  11. Passing Up-Stream B=P(B) B A,B AB=P(A|B) A A=1 Sends message dA->AB = A

  12. Passing Up-Stream B=P(B) B Computes dAB->B = SAAB dA->AB (= 1) A,B AB=P(A|B) A A=1 Sends message dA->AB = A

  13. Passing Up-Stream Computes bB = B dAB->B B=P(B) B Computes dAB->B = SAAB dA->AB (= 1) A,B AB=P(A|B) A A=1 Sends message dA->AB = A

  14. Message Passing Rules in Cluster Trees • Init: • Each node Cicontains a factor i=P , where product is taken over factors assigned to Ci • Each directed edge maintains message di->j (initially nil) • Repeat while some message into the query variable is nil: • Pick a node Ci that is “ready” to send to Cj: has received messages from all neighbors except Cj • Compute and send the message di->j • ComputeMessage(i,j): • Let Si,j=CiCj • Return SCi-SijiPkAdj(i)-{j}dk->i k1 k2 k3 i di->j j

  15. Branching Cluster Tree B E ABE A M AJ AM J

  16. Branching Cluster Tree B E ABE A Query variable M AJ AM J

  17. Branching Cluster Tree B E dB->ABE=B (=P(B)) ABE A M AJ AM J

  18. Branching Cluster Tree B E dB->ABE=B (=P(B)) ABE A M AJ AM J

  19. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A M AJ AM J

  20. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A M AJ AM J

  21. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A dM->AM=M (=1) M AJ AM J

  22. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE A dM->AM=M (=1) M AJ AM J

  23. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J

  24. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J

  25. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1)

  26. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1) dA->AJ= AdABE->AdAM->A(=P(A))

  27. Branching Cluster Tree B E dB->ABE=B (=P(B)) dE->ABE=E (=P(E)) ABE dABE->A=SB,E ABE dB->ABEdE->ABE(=P(A)) A dAJ->A=SJAJdA->AJ(=SJP(J|A) P(A) = P(J)) dM->AM=M (=1) M AJ AM J dAM->A= SMAMdM->AM(=SM P(M|A) = 1) dA->AJ= AdABE->AdAM->A(=P(A))

  28. Cluster Tree Calibration • Run Message Passing until no more messages to send • For all nodes: bi=iPkAdj(i)dk->i is the unconditional distribution over Ci • Much faster than running a separate query for all nodes!

  29. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J)

  30. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1

  31. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bAJ= P(J|A)P(A)*1 = P(A,J)

  32. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) dAJ->A=SJAJdJ->AJ (=1) A dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bAJ=P(A,J)

  33. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A)*1*1 = P(A) bAJ=P(A,J)

  34. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE dABE->A=P(A) dA->ABE=1 A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)

  35. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)

  36. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAJ=P(A,J)

  37. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) bJ=P(J) dJ->AJ=1 bA=P(A) bAM=P(M|A)P(A)*1= P(M,A) bAJ=P(A,J)

  38. Calibration Uncalibrated Calibrated B E dB->ABE=P(B) dE->ABE=P(E) ABE bABE= P(A|B,E)P(E)P(B)*1=P(A,B,E) dABE->A=P(A) dA->ABE=1 A dA->AM=P(A) dAJ->A=1 dM->AM=1 dAJ->A=P(J) M AJ AM J dAM->A=1 dA->AJ=P(A) dAM->M=SAP(M|A)P(A) = P(M) bJ=P(J) dJ->AJ=1 bA=P(A) bAM=P(M,A) bAJ=P(A,J)

  39. Incorporating Evidence? B E ABE A Evidence J M AJ AM J

  40. Cluster Graphs & Belief Propagation • Variables send “influence” to other variables through messages • Information about CiCj divides the tree into conditionally independent pieces • Exact inference when cluster graph is a tree • All graphs can be converted into a cluster tree through VE ordering (clique tree) • Factors may be large: what about non-trees?

  41. Cluster Graphs With Loops A AB AC B C BCD D

  42. ClusterGraphsWith Loops A AB AC B C BCD D

  43. Cluster Graphs With Loops Continue as if we had n-1 messages… A AB AC B C BCD D

  44. Cluster Graphs With Loops Do it again… A AB AC B C BCD D

  45. Cluster Graphs With Loops Now send revised messages from A given the current messages A AB AC B C BCD D

  46. Cluster Graphs With Loops And repeat… A AB AC B C Does the product of messages into X approach P(X) as more iterations are performed? fBCD D

  47. Loopy Belief Propagation • In many problems, yes! • Can construct problems where it doesn’t • Generalizes to other probabilistic graphical models (used in physics & material science, computer vision, sensor nets…)

  48. X1 X2 X3 X4 X5 X6 Application to Error-Correcting Codes • Send a 3-bit message ABC through a noisy channel (say p=0.9) • 3 checksums • D = A xor B • E = B xor C • F = D xor E • Observe 6 bits X1…X6 A B C XOR XOR D E XOR F

  49. X1 X2 X3 X4 X5 X6 Application to Error-Correcting Codes • Probability of A,B,C? • Clever checksums + clever circuits that perform loopy BP = turbo codes • Used widely in communications (3G, NASA, some Wifi standards) • Closer to Shannon limit than all prior codes! A B C XOR XOR D E XOR F

  50. Have a good spring break!

More Related