420 likes | 433 Views
10/26/05 Promoter Prediction (really!). Announcements. BCB Link for Seminar Schedules (updated) http://www.bcb.iastate.edu/seminars/index.html Seminar (Fri Oct 28) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino Assembly and Alignment of Genomic DNA Sequence Xiaoqiu Huang, ComS
E N D
10/26/05Promoter Prediction(really!) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Announcements • BCB Link for Seminar Schedules (updated) • http://www.bcb.iastate.edu/seminars/index.html • Seminar (Fri Oct 28) • 12:10 PM BCB Faculty Seminar in E164 Lagomarcino • Assembly and Alignment of Genomic DNA SequenceXiaoqiu Huang, ComS • http://www.bcb.iastate.edu/courses/BCB691-F2005.html#Oct%2028 • Mark your calendars: • 1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium • "Discovering transcription factor binding sites" • Douglas Brutlag,Dept of Biochemistry & Medicine • Stanford University School of Medicine D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri 10A - Approvals/responses to students Dec 2 Fri noon - Written project reports due Dec 5,7,8,9 class/lab - Oral Presentations (20') (Dec 15 Thurs = Final Exam) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Announcements Lab 9 - due Wed noon (today) Exam 2 - this Friday Posted Online:Exam 2 Study Guide 544 Reading Assignment (2 papers) Lab Keys (today) Thurs No Lab - Extra Office Hrs instead: David 1-3 PM in 209 Atanasoff Drena 1-3 PM in 106 MBB D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter Prediction RNA Structure/Function Prediction Mon Quite a few more words re: Gene prediction Wed Promoter prediction next Mon: RNA structure & function RNA structure prediction 2' & 3' structure prediction miRNA & target prediction D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Optional - but very helpful reading: (that's a hint!) • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ 03489059922 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Reading Assignment (for Mon) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Review last lecture:Flowchart for Gene PredictionPerformance Assessment MeasuresCorrection re: slide 10/24 # 27Promoters D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Gene prediction flowchart Fig 5.15 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Sp = Evaluation of Splice Site Prediction What do measures really mean? Fig 5.11 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Correction re: last lecture:GeneSeqer Performance Graphs Brendel et al (2004) Bioinformatics 20: 1157 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Performance? Human GT site Human AG site Sn Sn A. thaliana AG site A. thaliana GT site Sn Sn • Note: these are not ROC curves (plots of (1-Sn) vs Sp) • But plots such as these (& ROCs) much better than using "single number" to compare different methods • Both types of plots illustrate trade-off: Sn vs Sp Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Fig 2 - Brendel et al (2004) Bioinformatics 20: 1157 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
2-class model: 7 class model: Bayes Factor as Decision Criterion H0: H=T: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Evaluation of Splice Site Prediction Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: • Specificity: • Misclassification rates: • Normalized specificity: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: Careful: different definitions for "Specificity" Brendel definitions • Specificity: cf. Guig�ó definitions Sn: Sensitivity = TP/(TP+FN) Sp: Specificity = TN/(TN+FP) = Sp- AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1 Other measures? Predictive Values, Correlation Coefficient D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Best measures for comparing different methods? • ROC curves(Receiver Operating Characteristic?!!) • http://www.anaesthetist.com/mnm/stats/roc/ • "The Magnificent ROC" - has fun applets & quotes: • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers" • Correlation Coefficient • (Matthews correlation coefficient (MCC) • MCC = 1 for a perfect prediction • 0 for a completely random assignment • -1 for a "perfectly incorrect" prediction Do not memorize this! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
PromotersWhat signals are there? Simple ones in prokaryotes Brown Fig 9.17 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!) BIOS Scientific Publishers Ltd, 1999
Prokaryotic promoters • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors” • Prokaryotic promoter sequences are highly conserved • -10 region • -35 region D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
What signals are there? Complex ones in eukaryotes! Fig 9.13 Mount 2004 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Simpler view of complex promoters in eukaryotes: Fig 5.12 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Eukaryotic genes are transcribed by 3 different RNA polymerases Recognize different types of promoters & enhancers: Brown Fig 9.18 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!) BIOS Scientific Publishers Ltd, 1999
Eukaryotic promoters & enhancers • Promoters located “relatively” close to initiation site (but can be located within gene, rather than upstream!) • Enhancers also required for regulated transcription (these control expression in specific cell types, developmental stages, in response to environment) • RNA polymerase complexes do not specifically recognize promoter sequences directly • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Eukaryotic transcription factors • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription • TFs contain characteristic “DNA binding motifs” http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039 • TFs recognize specific short DNA sequence motifs “transcription factor binding sites” • Several databases for these, e.g.TRANSFAC http://www.generegulation.com/cgibin/pub/databases/transfac D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Zinc finger-containing transcription factors • Common in eukaryotic proteins • Estimated 1% of mammalian genes encode zinc-finger proteins • In C. elegans, there are 500! • Can be used as highly specific DNA binding modules • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy Brown Fig 9.12 BIOS Scientific Publishers Ltd, 1999 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
New Today: Promoter Prediction • Predicting regulatory regions (focus on promoters) • Brief review promoters & enhancers • Predicting promoters: eukaryotes vs prokaryotes Next week: • RNA structure & function D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Predicting Promoters • Overview of strategies • What sequence signals can be used? • What other types of information can be used? • Algorithms • Promoter prediction software • 3 major types • many, many programs! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter prediction: Eukaryotes vs prokaryotes Promoter prediction is easier in microbial genomes Why? Highly conserved Simpler gene structures More sequenced genomes! (for comparative approaches) Methods? Previously, again mostly HMM-based Now: similarity-based. comparative methods because so many genomesavailable D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Predicting promoters: Steps & Strategies • Closely related to gene prediction! • Obtain genomic sequence • Use sequence-similarity based comparison • (BLAST, MSA) to find related genes • But: "regulatory" regions are much less well-conserved than coding regions • Locate ORFs • Identify TSS (if possible!) • Use promoter prediction programs • Analyze motifs, etc. in sequence(TRANSFAC) D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Predicting promoters: Steps & Strategies Identify TSS --if possible? • One of biggest problems is determining exact TSS! Not very many full-length cDNAs! • Good starting point? (human & vertebrate genes) Use FirstEF found within UCSC Genome Browser or submit to FirstEF web server Fig 5.10 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Automated promoter prediction strategies • Pattern-driven algorithms • Sequence-driven algorithms • Combined "evidence-based" • BEST RESULTS? Combined, sequential D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter Prediction: Pattern-driven algorithms • Success depends on availability of collections of annotated binding sites (TRANSFAC & PROMO) • Tend to produce huge numbers of FPs • Why? • Binding sites (BS) for specific TFs often variable • Binding sites are short (typically 5-15 bp) • Interactions between TFs (& other proteins) influence affinity & specificity of TF binding • One binding site often recognized by multiple BFs • Biology is complex: promoters often specific to organism/cell/stage/environmental condition D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter Prediction: Pattern-driven algorithms Solutions to problem of too many FP predictions? • Take sequence context/biology into account • Eukaryotes: clusters of TFBSs are common • Prokaryotes: knowledge of factors helps • Probability of "real" binding site increases if annotated transcription start site (TSS) nearby • But: What about enhancers? (no TSS nearby!) & Only a small fraction of TSSs have been experimentally mapped • Do the wet lab experiments! • But: Promoter-bashing is tedious D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter Prediction: Sequence-driven algorithms • Assumption: common functionality can be deduced from sequence conservation • Alignments of co-regulated genes should highlight elements involved in regulation Careful: How determine co-regulation? • Orthologous genes from difference species • Genes experimentally determined to be co-regulated (using microarrays??) • Comparative promoter prediction: "Phylogenetic footprinting" - more later…. D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Promoter Prediction: Sequence-driven algorithms Problems: • Need sets of co-regulated genes • For comparative (phylogenetic) methods • Must choose appropriate species • Different genomes evolve at different rates • Classical alignment methods have trouble with translocations, inversions in order of functional elements • If background conservation of entire region is highly conserved, comparison is useless • Not enough data (Prokaryotes >>> Eukaryotes) • Biology is complex: many (most?) regulatory elements are not conserved across species! D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Examples of promoter prediction/characterization software Lab: used MATCH, MatInspector TRANSFAC MEME & MAST BLAST, etc. Others? FIRST EF Dragon Promoter Finder(these are links in PPTs) also see Dragon Genome Explorer (has specialized promoter software for GC-rich DNA, finding CpG islands, etc) JASPAR D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
TRANSFAC matrix entry: for TATA box • Fields: • Accession & ID • Brief description • TFs associated with this entry • Weight matrix • Number of sites used to build (How many here?) • Other info Fig 5.13 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS) Fig 5.14 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Check out optional review & try associated tutorial: Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Annotated lists of promoter databases & promoter prediction software • URLs from Mount Chp 9, available online Table 9.12http://www.bioinformaticsonline.org/links/ch_09_t_2.html • Table in Wasserman & Sandelin Nat Rev Genet article http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.htm • URLs for Baxevanis & Ouellette, Chp 5: http://www.wiley.com/legacy/products/subject/life/bioinformatics/ch05.htm#links More lists: • http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=promoter • http://bioinformatics.ubc.ca/resources/links_directory/?subcategory_id=104 • http://www3.oup.co.uk/nar/database/subcat/1/4/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)
Reading Assignment (for Mon) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)