1 / 46

10/24/05 Promoter Prediction RNA Structure & Function Prediction

10/24/05 Promoter Prediction RNA Structure & Function Prediction. Announcements. Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PM IG Faculty Seminar in 101 Ind Ed II

joubert
Download Presentation

10/24/05 Promoter Prediction RNA Structure & Function Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10/24/05Promoter PredictionRNA Structure & FunctionPrediction D Dobbs ISU - BCB 444/544X: Promoter Prediction

  2. Announcements Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PMIG Faculty Seminar in 101 Ind Ed II "Laser capture microdissection-facilitated transcriptional profiling of abscission zones in Arabidopsis" Coralie Lashbrook, EEOB http://www.bb.iastate.edu/%7Emarit/GEN691.html Mark your calendars: 1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium "Discovering transcription factor binding sites" Douglas Brutlag,Dept of Biochemistry & Medicine, Stanford University School of Medicine D Dobbs ISU - BCB 444/544X: Promoter Prediction

  3. Announcements • 544 Semester Projects • Thanks to all who sent already! • Others: Information needed today! • ddobbs@iastate.edu • Briefly describe: • Your background & current grad research • Is there a problem related to your research you would like to learn more about & develop as project for this course? • or • What would your ‘dream’ project be? D Dobbs ISU - BCB 444/544X: Promoter Prediction

  4. Announcements Exam 2 - this Friday Posted Online:Exam 2 Study Guide 544 Reading Assignment (2 papers) Office Hours: David Mon 1-2 PM in 209 Atanasoff Drena Tues 10-11AM in 106 MBB Michael - none this week Thurs No Lab - Extra Office Hrs instead: David 1-3 PM in 209 Atanasoff Drena 1-3 PM in 106 MBB D Dobbs ISU - BCB 444/544X: Promoter Prediction

  5. Announcements • Updated PPTs & PDFs for Gene Prediction lectures (covered on Exam 2) will be posted today (changes are minor) • Is everyone on BCB 444/544 mailing list? Auditors? D Dobbs ISU - BCB 444/544X: Promoter Prediction

  6. Promoter Prediction & RNA Structure/Function Prediction Mon Quite a few more words re: Gene prediction Promoter prediction WedRNA structure & function RNA structure prediction 2' & 3' structure prediction miRNA & target prediction Thurs No Lab Fri Exam 2 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  7. Reading Assignment - previous • Mount Bioinformatics • Chp 9Gene Prediction & Regulation • pp 361-401 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • * Brown Genomes 2 (NCBI textbooks online) • Sect 9 Overview: Assembly of Transcription Initiation Complex • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002 • Sect 9.1-9.3 DNA binding proteins, Transcription initiation • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016 • *NOTEs: Don’t worry about the details!! • See Study Guide for Exam 2 re:Sections covered D Dobbs ISU - BCB 444/544X: Promoter Prediction

  8. Optional - but very helpful reading: (that's a hint!) • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ 03489059922 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  9. Reading Assignment (for Wed) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction

  10. Review last lecture: Gene Prediction(formerly Gene Prediction - 3) • Overview of steps & strategies • Algorithms • Gene prediction software D Dobbs ISU - BCB 444/544X: Promoter Prediction

  11. Predicting Genes - Basic steps: • Obtain genomic DNA sequence • Translate in all 6 reading frames • Compare with protein sequence database • Also perform database similarity search • with EST & cDNA databases, if available • Use gene prediction programs to locate genes • Analyze gene regulatory sequences • Note: Several important details missing above: • 1. Mask to "remove" repetitive elements (ALUs, etc.)・ • Perform database search on translatedDNA (BlastX,TFasta) • Use several programs to predict genes (GenScan,GeneMark.hmm) • 4. Translate putative ORFs and search for functional motifs (Blocks, Motifs, etc.) & regulatory sequences D Dobbs ISU - BCB 444/544X: Promoter Prediction

  12. Gene prediction flowchart Fig 5.15 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  13. Overview of gene prediction strategies • What sequence signals can be used? • Transcription:TF binding sites, promoter, initiation site, terminator • Processing signals:splice donor/acceptors, polyA signal • Translation: start (AUG = Met) & stop (UGA,UUA, UAG) • ORFs, codon usage • What other types of information can be used? • cDNAs & ESTs(pairwise alignment) • homology(sequence comparison, BLAST) D Dobbs ISU - BCB 444/544X: Promoter Prediction

  14. Examples of gene prediction software • Similarity-based or Comparative • BLAST • SGP2 (extension of GeneID) • Ab initio = “from the beginning” • GeneID - (used in lab last week) • GENSCAN - (used in lab last week) • GeneMark.hmm - (should try this!) • Combined "evidence-based” • GeneSeqer (Brendel et al., ISU) BEST?GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task D Dobbs ISU - BCB 444/544X: Promoter Prediction

  15. Annotated lists of gene prediction software • URLs from Mount Chp 9, available online Table 9.1http://www.bioinformaticsonline.org/links/ch_09_t_1.html • from Pevsner Chps 14 & 16 http://www.bioinfbook.org/chapt14.htm - prokaryotic http://www.bioinfbook.org/chapt16.htm - eukaryotic • Table in Zhang Nat Rev Genet article: hptt://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Another list: Kozar, Stanford http://cmgm.stanford.edu/classes/genefind/ • Performance Evaluation? Guig�ó, Barcelona(&sites above)http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html D Dobbs ISU - BCB 444/544X: Promoter Prediction

  16. Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Methods? Previously, mostly HMM-based Now: similarity-based methods because so many genomesavailable see Mount Fig 9.7 (E.coli gene) Many microbial genomes have been fully sequenced & whole-genome "gene structure" and "gene function" annotations are available. e.g., GeneMark.hmm TIGRComprehensive Microbial Resource (CMR) NCBIMicrobial Genomes D Dobbs ISU - BCB 444/544X: Promoter Prediction

  17. UCSC Browser view of 1000 kb region (Human URO-D gene) Fig 5.10 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  18. GeneSeqer - Brendel et al. Intron GT AG Donor Acceptor Splice sites Spliced Alignment Algorithm http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi • Perform pairwise alignment with large gaps in one sequence (due to introns) • Align genomic DNA with cDNA, ESTs, protein sequences • Score semi-conserved sequences at splice junctions • Using a Bayesian model • Score coding constraints in translated exons • Using a Bayesian model Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  19. Start codon Stop codon Genomic DNA Start codon Stop codon -Poly(A) mRNA Cap- 5’-UTR 3’-UTR Brendel - Spliced Alignment I: Compare with cDNA or EST probes Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  20. Start codon Stop codon Genomic DNA Protein Brendel - Spliced Alignment II: Compare with protein probes Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  21. Information Content Ii: • Extent of Splice Signal Window: Splice Site Detection Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal? YES i: ith position in sequence Ī: avg information content over all positions >20 nt from splice site Ī: avg sample standard deviation of Ī Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  22. Human T2_GT Human T2_AG Information content vs position Which sequences are exons & which are introns? How can you tell? Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  23. Let S = s-l s-l+1 s-l+2…s-1GT s1 s2 s3 …sr Bayesian Splice Site Prediction where H indexes the hypotheses of GT or AG at - True site in reading phase 1, 2, or 0 - False within-exon site in reading phase 1, 2, or 0 - False within-intron site Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  24. H0: H=T 2-class model: Bayes Factor as Decision Criterion 7-class model: Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  25. PG PG (1-PG)(1-PD(n+1)) en en+1 (1-PG)PD(n+1) PA(n)PG (1-PG)PD(n+1) in in+1 1-PA(n) Markov Model for Spliced Alignment Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  26. Evaluation of Splice Site Prediction Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: • Specificity: • Misclassification rates: • Normalized specificity: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  27. Performance?   Human GT site Human AG site Sn Sn   A. thaliana AG site A. thaliana GT site Sn Sn • Note: these are not ROC curves (plots of (1-Sn) vs Sp) • But plots such as these (& ROCs) much better than using "single number" to compare different methods • Both types of plots illustrate trade-off: Sn vs Sp Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  28. Sp = Evaluation of Splice Site Prediction What do measures really mean? Fig 5.11 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  29. Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: Careful: different definitions for "Specificity" Brendel definitions • Specificity: cf. Guig�ó definitions Sn: Sensitivity = TP/(TP+FN) Sp: Specificity = TN/(TN+FP) = Sp- AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1 Other measures? Predictive Values, Correlation Coefficient D Dobbs ISU - BCB 444/544X: Promoter Prediction

  30. Best measures for comparing different methods? • ROC curves(Receiver Operating Characteristic?!!) • http://www.anaesthetist.com/mnm/stats/roc/ • "The Magnificent ROC" - has fun applets & quotes: • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers" • Correlation Coefficient • (Matthews correlation coefficient (MCC) • MCC = 1 for a perfect prediction • 0 for a completely random assignment • -1 for a "perfectly incorrect" prediction Do not memorize this! D Dobbs ISU - BCB 444/544X: Promoter Prediction

  31. Performance of GeneSeqer vs other methods? • Comparison with ab initio gene prediction (e.g., GENESCAN) • Depends on: • Availability of ESTs • Availability of protein homologs Other Performance Evaluations? Guig�ó http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  32. GeneSeqer vs GENSCAN (Exon prediction) 1.00 0.90 0.80 0.70 0.60 Exon (Sn + Sp) / 2 0.50 0.40 GeneSeqer 0.30 NAP 0.20 GENSCAN 0.10 0.00 0 10 20 30 40 50 60 70 80 90 100 Target protein alignment score GENSCAN - Burge, MIT Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  33. 1.00 0.90 0.80 0.70 0.60 Intron (Sn + Sp) / 2 0.50 GeneSeqer 0.40 0.30 NAP 0.20 GENSCAN 0.10 0.00 0 10 20 30 40 50 60 70 80 90 100 Target protein alignment score GeneSeqer vs GENSCAN (Intron prediction) GENSCAN - Burge, MIT Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  34. Other Resources • Current Protocols in Bioinformatics • http://www.4ulr.com/products/currentprotocols/bioinformatics.html • Finding Genes • 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations • 4.2 Using MZEF To Find Internal Coding Exons • 4.3 Using GENEID to Identify Genes • 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes • 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm • 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm • 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome • 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences • 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation • 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences D Dobbs ISU - BCB 444/544X: Promoter Prediction

  35. New Today: Promoter Prediction • A few more words about Gene prediction • Predicting regulatory regions (focus on promoters) • Brief review promoters & enhancers • Predicting in eukaryotes vs prokaryotes • Introduction to RNA • Structure & function D Dobbs ISU - BCB 444/544X: Promoter Prediction

  36. Predicting Promoters What signals are there? Algorithms Promoter prediction software D Dobbs ISU - BCB 444/544X: Promoter Prediction

  37. What signals are there? Simple ones in prokaryotes Brown Fig 9.17 D Dobbs ISU - BCB 444/544X: Promoter Prediction BIOS Scientific Publishers Ltd, 1999

  38. Prokaryotic promoters • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors” • Prokaryotic promoter sequences are highly conserved • -10 region • -35 region D Dobbs ISU - BCB 444/544X: Promoter Prediction

  39. What signals are there? Complex ones in eukaryotes! Fig 9.13 Mount 2004 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  40. Simpler view of complex promoters in eukaryotes: Fig 5.12 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  41. Eukaryotic genes are transcribed by 3 different RNA polymerases Recognize different types of promoters & enhancers: Brown Fig 9.18 D Dobbs ISU - BCB 444/544X: Promoter Prediction BIOS Scientific Publishers Ltd, 1999

  42. Eukaryotic promoters & enhancers • Promoters located “relatively” close to initiation site (but can be located within gene, rather than upstream!) • Enhancers also required for regulated transcription (these control expression in specific cell types, developmental stages, in response to environment) • RNA polymerase complexes do not specifically recognize promoter sequences directly • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes D Dobbs ISU - BCB 444/544X: Promoter Prediction

  43. Eukaryotic transcription factors • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription • TFs contain characteristic “DNA binding motifs” http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039 • TFs recognize specific short DNA sequence motifs “transcription factor binding sites” • Several databases for these, e.g.TRANSFAC http://www.generegulation.com/cgibin/pub/databases/transfac D Dobbs ISU - BCB 444/544X: Promoter Prediction

  44. Zinc finger-containing transcription factors • Common in eukaryotic proteins • Estimated 1% of mammalian genes encode zinc-finger proteins • In C. elegans, there are 500! • Can be used as highly specific DNA binding modules • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy Brown Fig 9.12 BIOS Scientific Publishers Ltd, 1999 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  45. Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS) Fig 5.14 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction

  46. Reading Assignment (for Wed) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction

More Related