1 / 22

Outline for today

Outline for today. Lec 06. Gene Prediction: What are genes ? Where are genes? Why do we care about a definition? Prokaryotic vs. eukaryotic gene models Introns/exons Splicing Alternative splicing Genes-in-genes , g enes-ad-genes Multi-subunit proteins Gene identification

ayame
Download Presentation

Outline for today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline for today Lec 06 • Gene Prediction: What aregenes? Where are genes? Why do we care about a definition? • Prokaryotic vs. eukaryotic gene models • Introns/exons • Splicing • Alternative splicing • Genes-in-genes, genes-ad-genes • Multi-subunit proteins • Gene identification • Homology-based gene prediction • Similarity Searches (e.g. BLAST, BLAT) • Genome Browsers • RNA evidence (ESTs) • Ab initio gene prediction • Gene prediction programs: prokaryotes, eukaryotes • Promoter prediction • PolyA-signal prediction • Splice site, start/stop-codon predictions Slide 132

  2. Lec 06 Alternative splicing • Alternative splicing can be either constitutive or regulated • Constitutive alternative splicing: more than one product is always made from a pre-mRNA. • Regulative alternative splicing: different forms of mRNA are produced at different time, under different conditions, or in different cell or tissue types. • Alternative splicing is regulated by activators and repressors. • The regulating sequences : exonic or intronic; splicing enhancers (ESE or ISE) or silencers (ESS and ISS). The former enhance and the latter repress splicing. • Proteins that regulate splicing bind to these specific sites for their action. Mo Chen & James L. Manley (2009): Nature Reviews Molecular Cell Biology 10, 741-754. Slide 133

  3. Lec 06 Alternative splicing • Alternative splicing can generate tens of thousands of mRNAs from a single primary transcript. Alternative splicing generatessegments of mRNA variability that can insert or removeamino acids, shift the reading frame, or introduce a terminationcodon. • The typical human gene contains an average of 8 exons. • Up to 59% of human genes generatemultiple mRNAs by alternative splicing and ∼80% of alternative splicing results inchanges in the encoded protein. • A large fraction of alternative splicing undergoes cellspecificregulation in which splicing pathways are modulatedaccording to cell type, developmental stage, gender,or in response to external stimuli. 1 2 3 5 Heart muscle mRNA 1 2 3 4 5 Pre-mRNA Uterine muscle mRNA 1 3 4 5 Slide 134

  4. Lec 06 DNA Environment, Development mRNA Cell type 1 80% 20% 10% Cell type 2 90% Cell type 3 absent 100% Alternative splicing is the process where one gene produces more than one type of mRNA. Alternative splicing • The phenotype is determined by the proteome & transcriptome. • Selection acts on the phenotype, and is blind to the genotype. • Therefore: two species/individuals that have different forms of a protein will be selected differently - even if the genes DNA sequence is identical. Slide 135

  5. Fas ligand Fas 5 6 7 (membrane-associated) Intron 1 Intron 2 (+) APOPTOSIS Fas pre-mRNA 5 6 7 (-) Fas ligand Soluble Fas (membrane) 5 7 Alternative splicing Lec 06 • Alternative splicing can generate mRNAs encoding proteins withdifferent, even opposite functions. Alternative splicing of the fas apoptosis receptor • Therefore, understanding the mechanism of RNA splicing in normal cells and how it is regulated in different tissues and at different stages of development of an organism is essential in order to develop strategies to correct aberrant splicing in human pathologies. Slide 136

  6. Pathologies resulting from aberrant splicing can be grouped in two major categories Lec 06 • Mutations affecting a specific messenger RNA and disturbing its normal splicing pattern. • Examples: • ß-Thalassemia • Duchenne Muscular Dystrophy • Cystic Fibrosis • Frasier Syndrome • Frontotemporal Dementia and Parkinsonism • Mutations affecting proteins that are involved in splicing. Examples: • Spinal Muscular Atrophy • Retinitis Pigmentosa • Myotonic Dystrophy Slide 137

  7. 1 2A 3 1 3 2B 1 3 Splice variant detection Lec 06 • PCR method: simple, sensitive, with std curve enough accurate, however, only internal changes are detectable and can’t scaled up. • Capture probe: very sensitive and accurate, complicated probe design, expensive. • Microarray method: can be scaled up to an entire genome (high throughput), so any typesof splice variantsare detectable, but not very accurate,complex and expensive. Slide 138

  8. Outline for today Lec 06 • Gene Prediction: What aregenes? Where are genes? Why do we care about a definition? • Prokaryotic vs. eukaryotic gene models • Introns/exons • Splicing • Alternative splicing • Genes-in-genes, genes-ad-genes • Multi-subunit proteins • Gene identification • Homology-based gene prediction • Similarity Searches (e.g. BLAST, BLAT) • Genome Browsers • RNA evidence (ESTs) • Ab initio gene prediction • Gene prediction programs: prokaryotes, eukaryotes • Promoter prediction • PolyA-signal prediction • Splice site, start/stop-codon predictions Slide 139

  9. Bidirectional and partially overlapping genes Lec 06 • Not very common in human genome. • Provides possibility for common regulation of a gene pair. • Partially overlapping genes are usually encoded by opposite DNA strands. • Found in dense gene areas, as HLA class III complex on 6p21.3. • Could represent sense-antisense pair with one gene is coding mRNA, another is non-coding. Slide 140

  10. Genes within genes Lec 06 Neurofibromatosis gene (NF1) Nested intronic genes • OGMP-Oligodendrocyte myelin glycoprotein • EVI2A and EVO2B homologues of ecotropic viral intergration sites in mouse. • Two overlapping genes encoded by same strand of mt DNA (unique example). • Two independent AUG located in frame-shift to each other, second stop codon is derived from TA + A from poly-A. Slide 141

  11. Gene prediction Lec 06 Comparative Genomics • When we BLAST a sequence is that comparative genomics? • Entire genome compared to other entire genomes. • Use information from many genomes to learn more about the individual genes. • What are some questions that comparative genomics can address? • How has the organism evolved? • What differentiates species? • Which genes are required for organisms to survive in a certain environment? • Which non-coding regions are important? Slide 142

  12. Gene prediction through comparative genomics Lec 06 Different questions require different comparisons • Highly similar (conserved) regions between two genomes are useful or else they would have diverged. • If genomes are too closely related all regions are similar, not just genes. • If genomes are too far apart, analogous regions may be too dissimilar to be found. Slide 143

  13. Prokaryotes gene prediction Lec 06 • NCBI ORF finderhttp://www.ncbi.nlm.nih.gov/gorf/gorf.html • ORF Finder - identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. • The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin. • Based on NCBI ORF finder 90 ORFs were identified in the Contig3 (28715 bp). • This method is still not proper way for gene identification! Slide 144

  14. Prokaryotes gene prediction Lec 06 • Gene calling anomalies • Short genes: a gene is called 'short' when it has been truncated significantly at the 5'-end. Such genes are significantly shorter than their homologs in other species. Often this truncation causes the loss of important functional domains, resulting in theoretical loss of function of the gene. Slide 145

  15. Prokaryotes gene prediction Lec 06 • Gene calling anomalies • Long genes: a gene is called 'long' when it has been extended at the 5'-end. Such genes are significantly longer than their homolog's in other species. A long gene can create overlaps with neighbouring features, with the result being that neighbouring genes are called short or features in the flanking intergenic regions are missed. Slide 146

  16. Prokaryotes gene prediction Lec 06 • Gene calling anomalies • Unique gene: a gene is called 'unique' when it has no known homolog's in other species. For such genes, Blast comparisons at the amino acid level with genes in other organisms return no hits. Often, such a gene call is an anomaly which, in turn, causes other anomalies, e.g. neighbouring genes called short. DdesDRAFT_0263 is a unique gene. If DdesDRAFT_0264 were detected as a short gene, DdesDRAFT_0263 would actually be responsible for this short call. • Dubious (uncertain) gene: a gene called as unique that is too short to be a functional gene is classified as 'dubious.' In actual practice, very few (1-10) dubious genes are found in the gene calls. When present, both unique and dubious genes are included when searching intergenic regions for missed genes. Slide 147

  17. Prokaryotes gene prediction Lec 06 • Gene calling anomalies • Split genes interrupted by frame shifts and stop codons:a reported split gene could be a good gene that is interrupted by frame-shifts or stop codons. Such a gene is called as a series of consecutive smaller genes, all of which have many blast hits in common. Split genes DdesDRAFT_1032 and DdesDRAFT_1033 interrupted by a frame-shift. Slide 148

  18. Prokaryotes gene prediction Lec 06 • Gene calling anomalies • Missed genes:gene prediction programs often miss genes however, an alignment of this region indicates the presence of a perfectly good gene. No genes had been predicted in the region between DdesDRAFt_0231 and DdesDRAFT_0232. However, an alignment of this region indicates the presence of a perfectly good gene. Slide 149

  19. Free gene prediction software Lec 06 • GeneMark: Georgia Institute of Technology, Atlanta, Georgia, USA. http://exon.biology.gatech.edu • Based on GeneMark gene prediction software 14 genes were predicted in the Contig3 (28715 bp). Slide 150

  20. Free gene prediction software Lec 06 • Softberry: (FGENESB) Bacterial Operon and Gene Prediction. http://linux1.softberry.com/berry.phtml • Based on Softberry gene prediction software 6 genes were predicted in the Contig3 (28715 bp). Slide 151

  21. Free gene prediction software Lec 06 • EasyGene: gene finding in prokaryotes (1.2b Server). http://www.cbs.dtu.dk/services/EasyGene/ • Based on EasyGene 1.2b Server 6 genes were predicted in the Contig3 (28715 bp). Slide 152

  22. Free gene prediction software Lec 06 • Glimmer: NCBI Microbial Genome Annotation Tools. http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi • Based on Glimmer 81 genes were predicted in the Contig3 (28715 bp). Slide 153

More Related