310 likes | 509 Views
Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods.
E N D
Identification of Transcription Factor Binding Sites Lior Harpaz Ofer Shany 09/05/2004
Goal - find TFBS ! input output
Importance • TF regulate gene expression. • Identification of TF can teach us: • Mapping of regulatory pathways • Potential functions of genes
Experimental Methods • Footprinting • EMSA - electrophoretic mobility shift assay Problems: • Time consuming • Not scaled up to whole genomes
Computational Methods - Goals • Identifying known TFBSs in previously unknown locations. • Identifying unknown TFBSs.
Computational Methods • Basic idea - locate TFBS using sequence-searching Problems: • Short sequences (5-15 bp) • Degenerate sequences • Location • Biological reality
Computational Methods Possible solutions: Conservation = functional importance • mRNA expression pattern • Phylogenetic footprinting • Network-level conservation
Phylogenetic footprinting • Identify ortholog genes • Concentrate on conserved non-coding regions (possible regulatory regions) • Look for conserved motifs.
Why should it work ? • 40% alignment between human and mice genome • 80% of mouse genes have orthologs in human genome • Only 1%-5% of human genome encodes proteins.
= ? Things to consider… • Choosing genomes. • Locating transcriptional start site. • Alignment method.
More things to consider… • Different evolution rates for different regions in the genome. • PSSM score cut-off • Note - TFBSs within ORFs are not detected.
Phylogentetic footprinting in proteobacterial genomes • Study set of 190 genes of E.Coly with known TBFSs. • Orthologs were searched in eight other bacteria. • Motif search by Bayesian Gibbs sampling.
Bayesian Gibbs sampling • Algorithm for motif search. • Each motif is assigned with a MAP value.
Bayesian Gibbs sampling • Parameters and extensions: • Model sequence • Palindromic patterns • Background pattern • Distribution of spacing between TFBSs and translation start site
Results • Overall – in 146/184 sets, motives matched known regulatory sequences. • In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value. • In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.
Results • Out of the 166 sets (with >= 2 orthologs): • 131 corresponded to known TFBSs. • 3 corresponded to known stem & loop structures. • 32 data sets contained predictions with large MAP value: could be undocumentd sites ! • Documented site were found in 138 sites without using palindromic models.
Identification of a new TF • New site found near fabA, fabB & yqfA • YijC binds to these sites. • Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes. • Indication of yqfA’s involvement in metabolism of fatty-acids.
Genomic scale phylogenetic footprinting • 2113 ORFs of E.coli used. • 187 new sites identified as probable sites for 46 known TFs. • Remaining sites are expected to represent unknown TFBSs • MAP Values of predicted sites were lower.
Study set Ortholog Distribution Full set
Conclusions • New sites for known TF were found. • Conservation of Regulatory stem-loops. • New sites for unknown TF are predicted. • New TF identified (YijC). • Predicted gene function (yqfA).
Network level conservation • Each TF regulates the expression of many genes (20-400). • Conservation of global gene expression requires the conservation of regulatory mechanisms.
Total motifs: 80,000 P-value filter: 12,000 Low-complexity filter: 7,673 Hierarchically clustering: 1,269 Data analysis
34/48 known sites discovered. Large fraction of matches for significant p-values. Validation
Biological Significance • Functional coherence • Expression coherence
Characteristic Features • Conservation of binding affinity • Conservation of position & orientation
References • Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201 • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782. • Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108