130 likes | 252 Views
Finding Transcription Factor Motifs. Adapted from a lab created by Prof Terry Speed. Cell Cycle Data Set. Spellman et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
E N D
Finding Transcription Factor Motifs Adapted from a lab created by Prof Terry Speed
Cell Cycle Data Set Spellman et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Synchronized population of yeast cells using three independent methods (alpha factor arrest, elutriation, arrest of cdc15 temperature sensitive-mutant). Extracted RNA microarray experiments to determine expression of ~6000 genes over 18 time points. See http://cellcycle-www.stanford.edu
Outline Read in cell cycle data into R. Cluster cell cycle data using hierarchical clustering. Visualize cell cycle clusters. Find motifs in these clusters and visualize them using sequence logos.
Experimental Data 783 genes involved in the yeast cell cycle Expression levels measured for 18 time points Read the data into R: > dat <- read.table("ccdata.txt", header=T, sep="\t")
Hierarchical Clustering > distMat <- dist(dat) > clustObj <- hclust(distMat) > plot(clustObj)
Create Gene Expression Clusters Let's cut the dendrogram into 16 clusters: > cutObj <- cutree(clustObj, k=16) > print(table(cutObj)) Write out the gene names in each cluster into a text file: for( i in 1:16 ){ cluster.genes <- row.names(dat)[cutObj == i] fileName <- paste("cluster", i, ".txt", sep="") write(cluster.genes, fileName) }
What Do These Clusters Look Like? Let's plot the first 8 clusters: par(mfrow=c(2,4)) for( i in 1:8 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cutObj == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) }
What Do These Clusters Look Like? The remaining 8 clusters: par(mfrow=c(2,4)) for( i in 9:16 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cutObj == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) }
Picking Clusters for TF Motifs > barplot(table(cutObj), main="Cluster Sizes", xlab="Number of Genes") We want to select a cluster with a reasonably large number of genes to look for upstream TF binding site motifs. Co-expression Co-regulation. Hence we look to the promoter regions to see if we can elucidate common regular expression patterns. Statistically over-represented patterns are potential transcription binding sites.
Extracting Promoter Sequences Promoter sequence retrieval can be performed using RSA: http://rsat.ulb.ac.be/rsat/genome-scale-dna-pattern_form.cgi
TF Motif Finding Tools MEME http://meme.sdsc.edu/meme/meme.html BioProspector http://ai.stanford.edu/~xsliu/BioProspector/ Improbizer http://www.cse.ucsc.edu/~kent/improbizer/improbizer.html Verbumculus http://wwwdbl.dei.unipd.it/cgi-bin/verb/family.cgi OligoAnalysis http://embnet.cifn.unam.mx/~jvanheld/rsa-tools/oligo-analysis_form.cgi Mobydick http://genome.ucsf.edu/mobydick/
TF Motif Finding Tools MDScan http://ai.stanford.edu/~xsliu/MDscan/ Weeder http://159.149.109.16:8080/weederWeb/index2.html Gibbs Motif Sampler http://bayesweb.wadsworth.org/gibbs/gibbs.html AlignACE http://atlas.med.harvard.edu/cgi-bin/alignace.pl CONSENSUS http://bifrost.wustl.edu/consensus/html/Html/interface.html
Making Sequence Logos WebLogo http://weblogo.berkeley.edu/logo.cgi