Gibbs sampling for motif finding

Gibbs sampling for motif finding Yves Moreau

Overview • Markov Chain Monte Carlo • Gibbs sampling • Motif finding in cis-regulatory DNA • Biclustering microarray data

Markov Chain Monte-Carlo • Markov chain with transition matrix T A C G T A0.0643 0.8268 0.0659 0.0430 C 0.0598 0.0484 0.8515 0.0403 G 0.1602 0.3407 0.1736 0.3255 T 0.1507 0.1608 0.3654 0.3231 X=A X=T X=C X=G

Markov Chain Monte-Carlo • Markov chains can sample from complex distributions ACGCGGTGTGCGTTTGACGA ACGGTTACGCGACGTTTGGT ACGTGCGGTGTACGTGTACG ACGGAGTTTGCGGGACGCGT ACGCGCGTGACGTACGCGTG AGACGCGTGCGCGCGGACGC ACGGGCGTGCGCGCGTCGCG AACGCGTTTGTGTTCGGTGC ACCGCGTTTGACGTCGGTTC ACGTGACGCGTAGTTCGACG ACGTGACACGGACGTACGCG ACCGTACTCGCGTTGACACG ATACGGCGCGGCGGGCGCGG ACGTACGCGTACACGCGGGA ACGCGCGTGTTTACGACGTG ACGTCGCACGCGTCGGTGTG ACGGCGGTCGGTACACGTCG ACGTTGCGACGTGCGTGCTG ACGGAACGACGACGCGACGC ACGGCGTGTTCGCGGTGCGG % A C G Position T

Markov Chain Monte-Carlo • Let us look at the transition after two steps • Similarly, after n steps

Markov Chain Monte-Carlo • Stationary distribution p • If the samples are generated to the distribution p, the samples at the next step will also be generated according to p • p is a left eigenvector of T • Equilibrium distribution • Rows of T are stationary distributions •  From an arbitrary initial condition and after a sufficient number of steps (burn-in), the successive states of the Markov chains are samples from a stationary distribution

Detailed balance • A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the condition of detailed balance • Proof: • Problem: disjoint regions in probability space

Gibbs sampling • Markov chain for Gibbs sampling

Gibbs sampling • Detailed balance • Detailed balance for the Gibbs sampler • Prove detailed balance • Bayes’ rule • Q.E.D.

Data augmentation Gibbs sampling • Introducing unobserved variables often simplifies the expression of the likelihood • A Gibbs sampler can then be set up • Samples from the Gibbs sampler can be used to estimate parameters

Pros and cons • Pros • Clear probabilistic interpretation • Bayesian framework • “Global optimization” • Cons • Mathematical details not easy to work out • Relatively slow

Motif finding

Gibbs sampler • Gibbs sampling for motif finding • Set up a Gibbs sampler for the joint probability of the motif matrix and the alignment given the sequences • Sequence by sequence • Lawrence et al. • One motif of fixed length • One occurrence per sequence • Background model based on single nucleotides • Too sensitive to noise • Lots of parameter tuning

500 bp Translation start

Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

Gibbs motif finding • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Stabilization of the motif matrix(not of the alignment)

Motif Sampler (extended Gibbs sampling) • Model • One motif of fixed length per round • Several occurrences per sequence • Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound) • Multiple motifs found in successive rounds by masking occurrences of previous motifs • Improved background model based on oligonucleotides • Gapped motifs

500 bp Translation start

Gibbs sampling for motif finding

Gibbs sampling for motif finding

Presentation Transcript

Regulatory Motif Finding

Regulatory Motif Finding

Gibbs sampling

DNA Motif Finding

(Regulatory-) Motif Finding

Motif finding

Motif identification with Gibbs Sampler

Comparative Motif Finding

Motif Finding

Motif Finding

Motif finding

Motif Finding

Motif Finding

Gibbs Sampling

Multiple Species Gene Finding using Gibbs Sampling

Regulatory Motif Finding

Motif Finding

Motif Finding

Motif finding with Gibbs sampling

The Gibbs Motif Sampler

Gibbs Sampling in Motif Finding

(Regulatory-) Motif Finding