A Very Basic Gibbs Sampler for Motif Detection

A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute

What? • What is a motif? • What is the biological point? • What is the goal? • What is a limitation with dynamic programming? • What is Gibbs sampling? • What is the program going to do? • What is the program missing? • What is a much better program?

What is a motif? • A motif is a sequence pattern that occurs repeatedly in a group of related dna or protein sequences.

What is the biological point? • Encoding the “structural motif” of a protein (dna in exons) • Prediction of function (protein) • Protein binding sites (dna) • Transcription binding factors

What is the goal? • Perform local multiple sequence alignment to find consensus sequences (motifs)

What is a limitation with dynamic programming? • Memory and time complexity issues • Size of search space = (L-W+1)^N L = length of a sequence W = width of motif N = number of sequences • ex: L=30, W=7, N=10 (30 -7+1)^10 = 1333735776850284124449081472843776 • Alignment of even four such sequences will take a few hours ~10^4 seconds

What is Gibbs sampling? • Stochastic optimization method • Works well with local multiple alignment without gaps (motif searching) • Searches for the statistically most probable motifs by sampling random positions instead of going through entire search space

What is the program going to do? • Ask user for : • file containing multiple dna or protein sequences • motif width • how many motifs wanted • Calculate the background frequencies of A,C,G,T from all the sequences. [0.34951456310679613, 0.17799352750809061, 0.21035598705501618, 0.23300970873786409]

What is the program going to do? • Generate random start positions for the motif in each sequence. ex: 10 sequences, 30 bp in length, motif width of 7 start = [2, 6, 9, 14, 5, 7, 20, 20, 6, 22] >> random.uniform(0,ceiling) where ceiling=len(sequence)-width

What is the program going to do? 4. Construct position specific score matrix from all sequences except one.

What is the program going to do? 5. Score the left-out sequence according to the position specific score matrix:

What is the program going to do? Example: Use the position specific matrix and background from before: [A: 0.34951456310679613, C: 0.17799352750809061, G: 0.21035598705501618, T: 0.23300970873786409] GATTACA:

What is the program going to do? 6. Randomly generate another start position of the motif for that left-out sequence. 7. Score that sequence with its new start position. 8. Compare this new score with its original score. 9. If newscore >= oldscore, then jump to that new start position, else jump to that new start position with probability =

What is the program going to do? 10. Start all over again with this updated start position with another sequence left out Do this many many times! ~ 1000 iterations Gibbs will converge to a stationary distribution of the start positions => a probable alignment of the multiple sequences

What is the program missing? • Doesn’t do reinitializations in the middle to get out of local maxima • Doesn’t optimize the width (you have to specify width explicitly) • Doesn’t do the Bayesian approach – just frequentist (easier for me and for you to understand!) • Doesn’t read in fasta files • Doesn’t do error checking! • And other things that don’t know they are missing yet!

What is a much better program? • Gibbs Motif Sampler • http://bayesweb.wadsworth.org/gibbs/gibbs.html • AlignAce • http://atlas.med.harvard.edu/cgi-bin/alignace.pl

That’s it!

A Very Basic Gibbs Sampler for Motif Detection

A Very Basic Gibbs Sampler for Motif Detection

Presentation Transcript

Process Thought: A Very Basic Introduction

Edge Detection Enhancement Using Gibbs Sampler

Motif identification with Gibbs Sampler

Remote Homology detection : A motif based approach

INCIDENT RESPONSE A very basic Overview

Phylogenetic Models for Motif Detection

Basic Detection Techniques

Motif detection and Screening

Evaluating alignments using motif detection

Gibbs sampling for motif finding

Gibbs Sampler in Local Multiple Alignment

Very Basic Statistics

Very Basic AJAX

A Very Basic Fiber Optic Tutorial

Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

Motif finding with Gibbs sampling

The Gibbs Motif Sampler

Gibbs Sampling in Motif Finding

Epidemiology: a very basic start….

A Very Basic Gibbs Sampler for Motif Detection