240 likes | 494 Views
Gibbs sampling for motif finding. Yves Moreau. Overview. Markov Chain Monte Carlo Gibbs sampling Motif finding in cis -regulatory DNA Biclustering microarray data. Markov Chain Monte-Carlo. Markov chain with transition matrix T. A C G T
E N D
Gibbs sampling for motif finding Yves Moreau
Overview • Markov Chain Monte Carlo • Gibbs sampling • Motif finding in cis-regulatory DNA • Biclustering microarray data
Markov Chain Monte-Carlo • Markov chain with transition matrix T A C G T A0.0643 0.8268 0.0659 0.0430 C 0.0598 0.0484 0.8515 0.0403 G 0.1602 0.3407 0.1736 0.3255 T 0.1507 0.1608 0.3654 0.3231 X=A X=T X=C X=G
Markov Chain Monte-Carlo • Markov chains can sample from complex distributions ACGCGGTGTGCGTTTGACGA ACGGTTACGCGACGTTTGGT ACGTGCGGTGTACGTGTACG ACGGAGTTTGCGGGACGCGT ACGCGCGTGACGTACGCGTG AGACGCGTGCGCGCGGACGC ACGGGCGTGCGCGCGTCGCG AACGCGTTTGTGTTCGGTGC ACCGCGTTTGACGTCGGTTC ACGTGACGCGTAGTTCGACG ACGTGACACGGACGTACGCG ACCGTACTCGCGTTGACACG ATACGGCGCGGCGGGCGCGG ACGTACGCGTACACGCGGGA ACGCGCGTGTTTACGACGTG ACGTCGCACGCGTCGGTGTG ACGGCGGTCGGTACACGTCG ACGTTGCGACGTGCGTGCTG ACGGAACGACGACGCGACGC ACGGCGTGTTCGCGGTGCGG % A C G Position T
Markov Chain Monte-Carlo • Let us look at the transition after two steps • Similarly, after n steps
Markov Chain Monte-Carlo • Stationary distribution p • If the samples are generated to the distribution p, the samples at the next step will also be generated according to p • p is a left eigenvector of T • Equilibrium distribution • Rows of T are stationary distributions • From an arbitrary initial condition and after a sufficient number of steps (burn-in), the successive states of the Markov chains are samples from a stationary distribution
Detailed balance • A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the condition of detailed balance • Proof: • Problem: disjoint regions in probability space
Gibbs sampling • Markov chain for Gibbs sampling
Gibbs sampling • Detailed balance • Detailed balance for the Gibbs sampler • Prove detailed balance • Bayes’ rule • Q.E.D.
Data augmentation Gibbs sampling • Introducing unobserved variables often simplifies the expression of the likelihood • A Gibbs sampler can then be set up • Samples from the Gibbs sampler can be used to estimate parameters
Pros and cons • Pros • Clear probabilistic interpretation • Bayesian framework • “Global optimization” • Cons • Mathematical details not easy to work out • Relatively slow
Gibbs sampler • Gibbs sampling for motif finding • Set up a Gibbs sampler for the joint probability of the motif matrix and the alignment given the sequences • Sequence by sequence • Lawrence et al. • One motif of fixed length • One occurrence per sequence • Background model based on single nucleotides • Too sensitive to noise • Lots of parameter tuning
500 bp Translation start
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix
Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Stabilization of the motif matrix(not of the alignment)
Motif Sampler (extended Gibbs sampling) • Model • One motif of fixed length per round • Several occurrences per sequence • Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound) • Multiple motifs found in successive rounds by masking occurrences of previous motifs • Improved background model based on oligonucleotides • Gapped motifs
500 bp Translation start