1 / 22

The Havana-Gencode annotation

The Havana-Gencode annotation. GENCODE CONSORTIUM. Loci annotated in the 44 ENCODE regions. Experimental validations of the manual annotations. The annotations produced by the Havana team at Sanger are being verified experimenally through RT-PCRs and RACEs (University of Geneva).

hayden
Download Presentation

The Havana-Gencode annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Havana-Gencode annotation GENCODE CONSORTIUM

  2. Loci annotated in the 44 ENCODE regions

  3. Experimental validations of the manual annotations The annotations produced by the Havana team at Sanger are being verified experimenally through RT-PCRs and RACEs (University of Geneva) Initial annotation Experimental validations Experimental validation of the single exon annotated 5'RACEs to obtain full length mRNA(s) RT-PCRs to check 360 junctions Updated annotation New set of confirmed genes Bidirectionnal RACEs to obtain full length mRNAs

  4. Experimental validations of the manual annotations 5’RACEs to extend Known and Novel protein genes - 214 / 426 loci provided positive RACEs for at least one primer (50%) - About 10% of the successful RACEs extend the loci in 5’ (and some provide new exon junctions) (some RACE products are still being analysed)

  5. Experimental validations of the manual annotations RT-PCRs VEGA Novel_transcript and Putative When more than one junction were submitted for the same transcript, all the junctions were in accordance in 2/3 of the cases (mostly all junctions negative). • The Novel transcript loci have a higher success rate than the Putative loci (in accordance to their definition)

  6. Experimental validations of the manual annotations RT-PCRs on non canonical splice sites • 43 non canonical splice sites (non GT-AG or GC-AG) were detected in the 13 training ENCODE regions 32 could be tested by RT-PCR (others: too short exons for primer picking) • 1 was confirmed: it is actually a canonical U12 intron (AT-AC) • 6 provided canonical junctions (already existing in other annotated splice forms) • 25 were negative => None of the non canonical splice sites could be validated experimentally (83 other splice sites are being checked in the 31 other regions)

  7. Gene predictions outside of Havana-Gencode annotations 6 computational gene prediction programs (geneid, genscan, SGP, twinscan, fgenesh, exonify) ; 3 EST-based methods (acembly, Ecgene, Ensembl EST) In 13 ENCODE regions, 1255 predicted introns (by one or more of the 9 methods) are not annotated in VEGA: - 380 (30%) extend VEGA objects (1) - 530 (42%) are in introns of VEGA objects (2) - 11 (1%) link exons from distinct VEGA objects (3) - 334 (27%) are completely outside of VEGA annotations (4) Havana-Gencode: Predictions: (1) (2) (3) (4)

  8. *1: RT-PCR successful ; 2: RT-PCR povided a product with a wrong exon junction Gene predictions outside of Havana-Gencode annotations RT-PCRs on exons junctions 1255 predicted introns tested: => 16 RT-PCRs confirmed the predicted junction, 9 provided another junction. (excluding pseudogenes) => Only 3 are intergenic (new loci?) --> being extended by RACE

  9. Gene predictions outside of Havana-Gencode annotations: 31 last regions -About 3500 introns predicted by standard prograns from UCSC tracks are outside of the Havana-Gencode annotation (about 900 intergenic). Very few of those could correspond to real positive (=> Need to prioritize) - Additionaly, the EGASP predictions add about 7000 other new introns (about 1000 intergenic)

  10. Description of the annotations: gene density

  11. Description of the annotations: alternative splicing Avg: 4.2 transcripts per locus 6.7 exons per transcript

  12. Description of the annotations: coding loci 424 coding loci in 44 ENCODE regionsOn average, 44.6% of the transcripts are annotated as coding

  13. Description of the annotations: lengths of exons, introns, cds, utrs…

  14. Comparison between Havana-Gencode annotation and other sets ENSEMBL, REFSEQ, MGC, CCDS

  15. Gene level => Most of the genes from the other sets are contained in Havana-Gencode annotation (less for ENSEMBL)

  16. Transcript level => Very few full transcripts are exactly identical The coding part of the transcripts is better conserved

  17. Relaxed criterion: allows transcripts from the other sets to be included in Havana-Gencode transcripts Havana-Gencode transcript: Transcript from other sets: Supporting the annotated transcript Not supporting the annotated transcript => Few transcripts are exactly identical but most of the transcripts from other sets are included in transcripts from Havana-Encode, especially MGC genes (transcripts not as extended as the annotation)

  18. Transcript level: relaxed criterion => =>

  19. Exon/intron level => More common introns than exons: could be explained by the fact that most differences are in UTRs (last exons)

  20. Nucleotide level Conclusions • Havana-Gencode annotation is richer than the other data sets. • REFSEQ, MGC and CCDS are almost completely contained in Havana –Gencode, especially CCDS (smaller set) • ENSEMBL contains more “false positives” (bigger set) - Transcripts from the other sets are less extended than transcripts from Havana-Gencode annotations, especially MGC (very few transcripts are completely identical)

  21. Exon pair level (exon-intron-exon)

More Related