290 likes | 684 Views
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test. Amy Creekmore Ansci 490M November 19, 2002. Problems in predicting promoters/ transcription factor binding sites. Transcription factors often recognize relatively short and degenerate sequences.
E N D
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002
Problems in predicting promoters/ transcription factor binding sites • Transcription factors often recognize relatively short and degenerate sequences. • These sequences are commonly found through out the genome of the species. • Induction often depends on the spacing/ frequency of transcription binding sites within a sequences. • Binding sites are not always in the upstream region. Markstein et al., 2002, figure1 Markstein et al., 2002, figure 2
Different Approaches to Promoter Analysis • Saeed Tavazoie, et al. “Systematic determination of genetic network architecture” Nature Genetics 22: 281-285. • Discovery of transcriptional regulation sub-networks, or genes that are under the control of similar promoters. • De novo discovery of cis-regulatory elements in yeast using expression clustering of microarray data and AlignACE.
Different Approaches to Promoter Analysis • Michele Markstein, et al. “Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo” Proceeding of the National Academy of Scinces USA 99 (2): 763-768. • Identify genes (known and unknown) that are regulated by the characterized transcription factor Dorsal. • Used FLYENHANCER to screen for clusters of known Dorsal response elements.
Different Approaches to Promoter Analysis • Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. “ Proceedings of the National Academy of Sciences USA 99(2): 757-762. • Evaluated the extent to which the clustering of transcription factor binding sites can be used as the computational basis to identify cis-regulatory modules. • Used the program PASTER to search the genome for consensus binding sites of five different developmental transcription factors and then used CIS-ANALYST to visualize and compute results.
First Approach • Systematic determination of genetic network architecture Saeed Tavazoie, Jason D. Hughes, Michael J. Cambell, Raymond J. Cho, and George M. Church Nature Genetics 22: 281-285.
Method • Used microarray data by Cho et al. 1998 that consisted of expression data for 6000 genes at 15 times points during two S. cerevisiae mitotic cell cycles. • Analyzed 3000 “most variable ORFs” and normalized data by subtracting the mean expression level value across all time points for each gene. • Clustered genes by expression pattern using euclidean distance metric values in the k-means algorithm.
Partitioned the 3000 ORFs into 30 clusters and the genes to functional categories. • Determined the statistical significance for enrichment of a particular functional category.
Used AlignACE to align 600bp upstream regions in order to determined common nucleotide motifs.
Results • Found 18 motifs in 12 different clusters • Seven characterized transcription factor binding sites that are known to regulate many of the genes in their respective cluster. • Clusters with known regulons have cis-regulatory elements emerged as the highest scoring motif in every case. • examples include MCB box and SCB cell-cycle box. • Motifs that have not been previously described demonstrate strong correlation with clusters that are enrichement for genes with specific functions. • Cluster 3 motifs M3a and M3b and their association with RNA and translation related genes within and outside of cluster 3. • “Half of the 30 clusters were significantly enriched for functional categories or had significant motifs.”
Second Approach • Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo • Michele Markstein, Peter Markstein, Vicky Markstein, and Michael S. Levine. Proceeding of the National Academy of Scinces USA 99 (2): 763-768.
Dorsal Transcription Factor • Drosophila transcription factor involved in dorsal-ventral patterning in development. • Transcription can be inhibited or induced by Dorsal depending on the promoter. Also, transcription induction is concentration dependent. Zen Sog
Used a degenerate Dorsal consensus sequences to scan entire Drosophila genome using FLY ENHANCER.
Results • Computational searches successfully identified genes that are activated at high (Phm), intermediate (Ady), and low (Sog) levels of Dorsal. • At least 33% are known, or indicated, to be regulated by dorsal (5/15).
Third Approach • Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. • Benjamin P. Berman, Yutaka Nibu, Barret D. Pfeiffer, Pavel Tomancak, Susan E. Celniker, Michael Levine, Gerald M. Rubin, and Michael B. Eisen Proceedings of the National Academy of Sciences USA 99(2): 757-762.
Methods • Assigned consensus sequences to each of the five transcription factors using MEME and previously described binding sites. • Used the program PASTER to search the genome for sequences that matched and visualized with the program CIS-ANALYST (developed by the authors). • Using CIS-ANALYST analyzed the distribution of the sites and define windows that contained clusters of transcription factor binding sites.
Results • Examined novel clusters with 15binding sites (or more) per 700bp. • Identified 28 clusters that met this criteria - these sites contain binding sites for at least two of the factors. • 23 fall in upstream regions • 3 fall in intron regions
Results • Examined the 49 genes that could be regulated by these sites using in situ hybridization and DNA microarray analysis. • Ten of the 28 sites were upstream of in the first intron of anterior-posterior pattern expressed genes. • ~35% correct predictions
Conclusions (from papers) • Clustering can be used to successfully determine cis-regulatory elements and can be applied to other systems. • Clustering is more efficient when done using prior knowledge of transcription factor binding site(s). • Computational identifications of cis-regulatory DNA regions improves when using two or more different classes of recognition sequences (motifs). • “The grammar of the cis-regulatory code is clearly more complex than simply the density of transcription factor binding sites.” Berman et al. 2002
Conclusions (overall) • Promoter prediction is a powerful tool that can be used for low cost screens for transcription regulatory sites. • Success is going to depend on a number of factors: • the specific transcription factor (specificity of binding) • previous characterization • parameters used (window size) • annotation of the genome being used