1 / 24

Gibbs sampling for motif finding

Gibbs sampling for motif finding. Yves Moreau. Overview. Markov Chain Monte Carlo Gibbs sampling Motif finding in cis -regulatory DNA Biclustering microarray data. Markov Chain Monte-Carlo. Markov chain with transition matrix T. A C G T

tekla
Download Presentation

Gibbs sampling for motif finding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gibbs sampling for motif finding Yves Moreau

  2. Overview • Markov Chain Monte Carlo • Gibbs sampling • Motif finding in cis-regulatory DNA • Biclustering microarray data

  3. Markov Chain Monte-Carlo • Markov chain with transition matrix T A C G T A0.0643 0.8268 0.0659 0.0430 C 0.0598 0.0484 0.8515 0.0403 G 0.1602 0.3407 0.1736 0.3255 T 0.1507 0.1608 0.3654 0.3231 X=A X=T X=C X=G

  4. Markov Chain Monte-Carlo • Markov chains can sample from complex distributions ACGCGGTGTGCGTTTGACGA ACGGTTACGCGACGTTTGGT ACGTGCGGTGTACGTGTACG ACGGAGTTTGCGGGACGCGT ACGCGCGTGACGTACGCGTG AGACGCGTGCGCGCGGACGC ACGGGCGTGCGCGCGTCGCG AACGCGTTTGTGTTCGGTGC ACCGCGTTTGACGTCGGTTC ACGTGACGCGTAGTTCGACG ACGTGACACGGACGTACGCG ACCGTACTCGCGTTGACACG ATACGGCGCGGCGGGCGCGG ACGTACGCGTACACGCGGGA ACGCGCGTGTTTACGACGTG ACGTCGCACGCGTCGGTGTG ACGGCGGTCGGTACACGTCG ACGTTGCGACGTGCGTGCTG ACGGAACGACGACGCGACGC ACGGCGTGTTCGCGGTGCGG % A C G Position T

  5. Markov Chain Monte-Carlo • Let us look at the transition after two steps • Similarly, after n steps

  6. Markov Chain Monte-Carlo • Stationary distribution p • If the samples are generated to the distribution p, the samples at the next step will also be generated according to p • p is a left eigenvector of T • Equilibrium distribution • Rows of T are stationary distributions •  From an arbitrary initial condition and after a sufficient number of steps (burn-in), the successive states of the Markov chains are samples from a stationary distribution

  7. Detailed balance • A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the condition of detailed balance • Proof: • Problem: disjoint regions in probability space

  8. Gibbs sampling • Markov chain for Gibbs sampling

  9. Gibbs sampling • Detailed balance • Detailed balance for the Gibbs sampler • Prove detailed balance • Bayes’ rule • Q.E.D.

  10. Data augmentation Gibbs sampling • Introducing unobserved variables often simplifies the expression of the likelihood • A Gibbs sampler can then be set up • Samples from the Gibbs sampler can be used to estimate parameters

  11. Pros and cons • Pros • Clear probabilistic interpretation • Bayesian framework • “Global optimization” • Cons • Mathematical details not easy to work out • Relatively slow

  12. Motif finding

  13. Gibbs sampler • Gibbs sampling for motif finding • Set up a Gibbs sampler for the joint probability of the motif matrix and the alignment given the sequences • Sequence by sequence • Lawrence et al. • One motif of fixed length • One occurrence per sequence • Background model based on single nucleotides • Too sensitive to noise • Lots of parameter tuning

  14. 500 bp Translation start

  15. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  16. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  17. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  18. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  19. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  20. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  21. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  22. Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Stabilization of the motif matrix(not of the alignment)

  23. Motif Sampler (extended Gibbs sampling) • Model • One motif of fixed length per round • Several occurrences per sequence • Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound) • Multiple motifs found in successive rounds by masking occurrences of previous motifs • Improved background model based on oligonucleotides • Gapped motifs

  24. 500 bp Translation start

More Related