1 / 41

10/26/05 Promoter Prediction (really!)

10/26/05 Promoter Prediction (really!). Announcements. BCB Link for Seminar Schedules (updated) http://www.bcb.iastate.edu/seminars/index.html Seminar (Fri Oct 28) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino Assembly and Alignment of Genomic DNA Sequence Xiaoqiu Huang, ComS

Download Presentation

10/26/05 Promoter Prediction (really!)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10/26/05Promoter Prediction(really!) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  2. Announcements • BCB Link for Seminar Schedules (updated) • http://www.bcb.iastate.edu/seminars/index.html • Seminar (Fri Oct 28) • 12:10 PM BCB Faculty Seminar in E164 Lagomarcino • Assembly and Alignment of Genomic DNA SequenceXiaoqiu Huang, ComS • http://www.bcb.iastate.edu/courses/BCB691-F2005.html#Oct%2028 • Mark your calendars: • 1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium • "Discovering transcription factor binding sites" • Douglas Brutlag,Dept of Biochemistry & Medicine • Stanford University School of Medicine D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  3. Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri 10A - Approvals/responses to students Dec 2 Fri noon - Written project reports due Dec 5,7,8,9 class/lab - Oral Presentations (20') (Dec 15 Thurs = Final Exam) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  4. Announcements Lab 9 - due Wed noon (today) Exam 2 - this Friday Posted Online:Exam 2 Study Guide 544 Reading Assignment (2 papers) Lab Keys (today) Thurs No Lab - Extra Office Hrs instead: David 1-3 PM in 209 Atanasoff Drena 1-3 PM in 106 MBB D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  5. Promoter Prediction RNA Structure/Function Prediction Mon  Quite a few more words re: Gene prediction Wed Promoter prediction next Mon: RNA structure & function RNA structure prediction 2' & 3' structure prediction miRNA & target prediction D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  6. Optional - but very helpful reading: (that's a hint!) • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ 03489059922 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  7. Reading Assignment (for Mon) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  8. Review last lecture:Flowchart for Gene PredictionPerformance Assessment MeasuresCorrection re: slide 10/24 # 27Promoters D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  9. Gene prediction flowchart Fig 5.15 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  10. Sp = Evaluation of Splice Site Prediction What do measures really mean? Fig 5.11 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  11. Correction re: last lecture:GeneSeqer Performance Graphs Brendel et al (2004) Bioinformatics 20: 1157 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  12. Performance?   Human GT site Human AG site Sn Sn   A. thaliana AG site A. thaliana GT site Sn Sn • Note: these are not ROC curves (plots of (1-Sn) vs Sp) • But plots such as these (& ROCs) much better than using "single number" to compare different methods • Both types of plots illustrate trade-off: Sn vs Sp Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  13. Fig 2 - Brendel et al (2004) Bioinformatics 20: 1157 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  14. 2-class model: 7 class model: Bayes Factor as Decision Criterion H0: H=T: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  15. Evaluation of Splice Site Prediction Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: • Specificity: • Misclassification rates: • Normalized specificity: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  16. Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: Careful: different definitions for "Specificity" Brendel definitions • Specificity: cf. Guig�ó definitions Sn: Sensitivity = TP/(TP+FN) Sp: Specificity = TN/(TN+FP) = Sp- AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1 Other measures? Predictive Values, Correlation Coefficient D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  17. Best measures for comparing different methods? • ROC curves(Receiver Operating Characteristic?!!) • http://www.anaesthetist.com/mnm/stats/roc/ • "The Magnificent ROC" - has fun applets & quotes: • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers" • Correlation Coefficient • (Matthews correlation coefficient (MCC) • MCC = 1 for a perfect prediction • 0 for a completely random assignment • -1 for a "perfectly incorrect" prediction Do not memorize this! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  18. PromotersWhat signals are there? Simple ones in prokaryotes Brown Fig 9.17 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!) BIOS Scientific Publishers Ltd, 1999

  19. Prokaryotic promoters • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors” • Prokaryotic promoter sequences are highly conserved • -10 region • -35 region D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  20. What signals are there? Complex ones in eukaryotes! Fig 9.13 Mount 2004 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  21. Simpler view of complex promoters in eukaryotes: Fig 5.12 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  22. Eukaryotic genes are transcribed by 3 different RNA polymerases Recognize different types of promoters & enhancers: Brown Fig 9.18 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!) BIOS Scientific Publishers Ltd, 1999

  23. Eukaryotic promoters & enhancers • Promoters located “relatively” close to initiation site (but can be located within gene, rather than upstream!) • Enhancers also required for regulated transcription (these control expression in specific cell types, developmental stages, in response to environment) • RNA polymerase complexes do not specifically recognize promoter sequences directly • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  24. Eukaryotic transcription factors • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription • TFs contain characteristic “DNA binding motifs” http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039 • TFs recognize specific short DNA sequence motifs “transcription factor binding sites” • Several databases for these, e.g.TRANSFAC http://www.generegulation.com/cgibin/pub/databases/transfac D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  25. Zinc finger-containing transcription factors • Common in eukaryotic proteins • Estimated 1% of mammalian genes encode zinc-finger proteins • In C. elegans, there are 500! • Can be used as highly specific DNA binding modules • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy Brown Fig 9.12 BIOS Scientific Publishers Ltd, 1999 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  26. New Today: Promoter Prediction • Predicting regulatory regions (focus on promoters) • Brief review promoters & enhancers • Predicting promoters: eukaryotes vs prokaryotes Next week: • RNA structure & function D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  27. Predicting Promoters • Overview of strategies •  What sequence signals can be used? • What other types of information can be used? • Algorithms • Promoter prediction software • 3 major types • many, many programs! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  28. Promoter prediction: Eukaryotes vs prokaryotes Promoter prediction is easier in microbial genomes Why? Highly conserved Simpler gene structures More sequenced genomes! (for comparative approaches) Methods? Previously, again mostly HMM-based Now: similarity-based. comparative methods because so many genomesavailable D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  29. Predicting promoters: Steps & Strategies • Closely related to gene prediction! • Obtain genomic sequence • Use sequence-similarity based comparison • (BLAST, MSA) to find related genes • But: "regulatory" regions are much less well-conserved than coding regions • Locate ORFs • Identify TSS (if possible!) • Use promoter prediction programs • Analyze motifs, etc. in sequence(TRANSFAC) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  30. Predicting promoters: Steps & Strategies Identify TSS --if possible? • One of biggest problems is determining exact TSS! Not very many full-length cDNAs! • Good starting point? (human & vertebrate genes) Use FirstEF found within UCSC Genome Browser or submit to FirstEF web server Fig 5.10 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  31. Automated promoter prediction strategies • Pattern-driven algorithms • Sequence-driven algorithms • Combined "evidence-based" • BEST RESULTS? Combined, sequential D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  32. Promoter Prediction: Pattern-driven algorithms • Success depends on availability of collections of annotated binding sites (TRANSFAC & PROMO) • Tend to produce huge numbers of FPs • Why? • Binding sites (BS) for specific TFs often variable • Binding sites are short (typically 5-15 bp) • Interactions between TFs (& other proteins) influence affinity & specificity of TF binding • One binding site often recognized by multiple BFs • Biology is complex: promoters often specific to organism/cell/stage/environmental condition D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  33. Promoter Prediction: Pattern-driven algorithms Solutions to problem of too many FP predictions? • Take sequence context/biology into account • Eukaryotes: clusters of TFBSs are common • Prokaryotes: knowledge of  factors helps • Probability of "real" binding site increases if annotated transcription start site (TSS) nearby • But: What about enhancers? (no TSS nearby!) & Only a small fraction of TSSs have been experimentally mapped • Do the wet lab experiments! • But: Promoter-bashing is tedious D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  34. Promoter Prediction: Sequence-driven algorithms • Assumption: common functionality can be deduced from sequence conservation • Alignments of co-regulated genes should highlight elements involved in regulation Careful: How determine co-regulation? • Orthologous genes from difference species • Genes experimentally determined to be co-regulated (using microarrays??) • Comparative promoter prediction: "Phylogenetic footprinting" - more later…. D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  35. Promoter Prediction: Sequence-driven algorithms Problems: • Need sets of co-regulated genes • For comparative (phylogenetic) methods • Must choose appropriate species • Different genomes evolve at different rates • Classical alignment methods have trouble with translocations, inversions in order of functional elements • If background conservation of entire region is highly conserved, comparison is useless • Not enough data (Prokaryotes >>> Eukaryotes) • Biology is complex: many (most?) regulatory elements are not conserved across species! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  36. Examples of promoter prediction/characterization software Lab: used MATCH, MatInspector TRANSFAC MEME & MAST BLAST, etc. Others? FIRST EF Dragon Promoter Finder(these are links in PPTs) also see Dragon Genome Explorer (has specialized promoter software for GC-rich DNA, finding CpG islands, etc) JASPAR D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  37. TRANSFAC matrix entry: for TATA box • Fields: • Accession & ID • Brief description • TFs associated with this entry • Weight matrix • Number of sites used to build (How many here?) • Other info Fig 5.13 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  38. Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS) Fig 5.14 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  39. Check out optional review & try associated tutorial: Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  40. Annotated lists of promoter databases & promoter prediction software • URLs from Mount Chp 9, available online Table 9.12http://www.bioinformaticsonline.org/links/ch_09_t_2.html • Table in Wasserman & Sandelin Nat Rev Genet article http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.htm • URLs for Baxevanis & Ouellette, Chp 5: http://www.wiley.com/legacy/products/subject/life/bioinformatics/ch05.htm#links More lists: • http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=promoter • http://bioinformatics.ubc.ca/resources/links_directory/?subcategory_id=104 • http://www3.oup.co.uk/nar/database/subcat/1/4/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

  41. Reading Assignment (for Mon) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)

More Related