1 / 31

Identification of Transcription Factor Binding Sites

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods.

verity
Download Presentation

Identification of Transcription Factor Binding Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Transcription Factor Binding Sites Lior Harpaz Ofer Shany 09/05/2004

  2. Goal - find TFBS ! input output

  3. Importance • TF regulate gene expression. • Identification of TF can teach us: • Mapping of regulatory pathways • Potential functions of genes

  4. Experimental Methods • Footprinting • EMSA - electrophoretic mobility shift assay Problems: • Time consuming • Not scaled up to whole genomes

  5. Computational Methods - Goals • Identifying known TFBSs in previously unknown locations. • Identifying unknown TFBSs.

  6. Computational Methods • Basic idea - locate TFBS using sequence-searching Problems: • Short sequences (5-15 bp) • Degenerate sequences • Location • Biological reality

  7. Computational Methods Possible solutions: Conservation = functional importance • mRNA expression pattern • Phylogenetic footprinting • Network-level conservation

  8. Phylogenetic footprinting • Identify ortholog genes • Concentrate on conserved non-coding regions (possible regulatory regions) • Look for conserved motifs.

  9. Why should it work ? • 40% alignment between human and mice genome • 80% of mouse genes have orthologs in human genome • Only 1%-5% of human genome encodes proteins.

  10. = ? Things to consider… • Choosing genomes. • Locating transcriptional start site. • Alignment method.

  11. More things to consider… • Different evolution rates for different regions in the genome. • PSSM score cut-off • Note - TFBSs within ORFs are not detected.

  12. Phylogentetic footprinting in proteobacterial genomes • Study set of 190 genes of E.Coly with known TBFSs. • Orthologs were searched in eight other bacteria. • Motif search by Bayesian Gibbs sampling.

  13. Bayesian Gibbs sampling • Algorithm for motif search. • Each motif is assigned with a MAP value.

  14. Bayesian Gibbs sampling • Parameters and extensions: • Model sequence • Palindromic patterns • Background pattern • Distribution of spacing between TFBSs and translation start site

  15. Results • Overall – in 146/184 sets, motives matched known regulatory sequences. • In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value. • In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.

  16. Results • Out of the 166 sets (with >= 2 orthologs): • 131 corresponded to known TFBSs. • 3 corresponded to known stem & loop structures. • 32 data sets contained predictions with large MAP value: could be undocumentd sites ! • Documented site were found in 138 sites without using palindromic models.

  17. Identification of a new TF • New site found near fabA, fabB & yqfA • YijC binds to these sites. • Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes. • Indication of yqfA’s involvement in metabolism of fatty-acids.

  18. Genomic scale phylogenetic footprinting • 2113 ORFs of E.coli used. • 187 new sites identified as probable sites for 46 known TFs. • Remaining sites are expected to represent unknown TFBSs • MAP Values of predicted sites were lower.

  19. MAP values left-shift

  20. Study set Ortholog Distribution Full set

  21. Conclusions • New sites for known TF were found. • Conservation of Regulatory stem-loops. • New sites for unknown TF are predicted. • New TF identified (YijC). • Predicted gene function (yqfA).

  22. הפסקה

  23. Network level conservation • Each TF regulates the expression of many genes (20-400). • Conservation of global gene expression requires the conservation of regulatory mechanisms.

  24. Total motifs: 80,000 P-value filter: 12,000 Low-complexity filter: 7,673 Hierarchically clustering: 1,269 Data analysis

  25. 34/48 known sites discovered. Large fraction of matches for significant p-values. Validation

  26. Identification of known binding sites

  27. Biological Significance • Functional coherence • Expression coherence

  28. Characteristic Features • Conservation of binding affinity • Conservation of position & orientation

  29. References • Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201 • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782. • Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108

  30. Sensitivity Vs. Specificity

More Related