1 / 23

ASTD

ASTD. ‘ Alternative Splicing and Transcript Diversity database ’. What/who are we?. Firstly AltExtron Secondly ASD - Alternative splicing database, and the AltSplice pipeline

rmaynard
Download Presentation

ASTD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASTD ‘Alternative Splicing and Transcript Diversity database’

  2. What/who are we? • Firstly AltExtron • Secondly ASD - Alternative splicing database, and the AltSplice pipeline • database of alternative splice events and the resultant isoform splice patterns of genes from human, and other model species. • Thirdly, for grant purposes, ATD - Alternative transcript diversity database, and the AltTrans pipeline • formation of transcript isoforms on a genome-wide scale by creating a value-added database of full-length alternate transcripts from human and other model species. • We also host the AEdb database – manual annotations • the two, ASD and ATD, blended into 1 pipeline, so now we are: ASTD Alternative splicing and transcript diversity database www.ebi.ac.uk/astd ASTD

  3. Pipeline in a nutshell Poly(A) Pipeline 1. Ensembl gene slices & EMBL EST/mRNA/HTC/HInv download TSS Pipeline Peptide Pipeline 2. Immunoglobulin filtering (Blast) 9. Data generation SNP Pipeline 3. Redundant gene filtering (Blat) Conservation Pipeline 8. Events prediction 4. Genes vs EST/mRNA Alignment (Blast) 7. Splice patterns delineation 6. Intron/exon delineation 5. HSP Collection ASTD

  4. Limitations of the pipeline … • Pipeline defines consensus splice sites • True biology is removed: • dicistronic transcripts • Nested genes • Single exon genes • Small exons • Large introns Manual annotation would resolve these issues ….. ASTD

  5. Improvements … • New web interfaces – user friendly • New database schema that is normalised, extendable and maintainable • Pipeline improvements: some steps now automated, bugs corrected, some improvements and blat replaces blast for filtering redundant genes • Database allows external features to be included (Ensembl and VEGA annotations) to compare to our transcripts • Schema allows export of data in standard format – GTF2 and GFF3, EMBL flat file format, fasta format, and excel spreadsheet • Transcripts for complete genome, not restricted to those with alternative splice events • Introduction of unique identifiers • Addition of datasets as input to pipeline: HTC and HInv • Extension of 5’ and 3’ UTR to capture more TSS and poly(A) • Annotation of TSS (Align 5’ capped mRNAs from human and mouse to transcript ) and poly(A) to generate full length transcripts ASTD

  6. www.ebi.ac.uk/astd - Query tools Three query tools are available to retrieve entries: • Simple text search on the main page • Genome browsing • Advanced search ASTD

  7. Gene information ASTD

  8. Genomic region information … 1 ASTD

  9. Genomic region information … 2 ASTD

  10. Transcript information ASTD

  11. evidence for transcript … 1 ASTD

  12. evidence for transcript … 2 ASTD

  13. Expression information ASTD

  14. Splice event … 1 ASTD

  15. Splice event … 2 ASTD

  16. Peptide information ASTD

  17. Statistics • Human Number of genes with an ASTD transcript : 16715 Number of genes with an ASTD transcription_start_site : 4936 Number of genes with an ASTD polyA_site : 15376 Number of genes with an ASTD splicing event : 11316 Number of genes with multiple ASTD transcripts : 14101 Proportion of genes undergoing alternative splicing: 68 % Proportion of genes undergoing alternative polyadenylation: 92 % Proportion of genes undergoing alternative transcription_start_sites: 30 % • Mouse Number of genes with an ASTD transcript : 16491 Number of genes with an ASTD transcription_start_site : 948 Number of genes with an ASTD polyA_site : 13556 Number of genes with an ASTD splicing event : 9474 Number of genes with multiple ASTD transcripts : 13028 Proportion of genes undergoing alternative splicing: 57 % Proportion of genes undergoing alternative polyadenylation: 82 % Proportion of genes undergoing alternative transcription_start_sites: 6 % • Rat Number of genes with an ASTD transcript : 10424 Number of genes with an ASTD transcription_start_site : 503 Number of genes with an ASTD polyA_site : 8842 Number of genes with an ASTD splicing event : 2865 Number of genes with multiple ASTD transcripts : 6344 Proportion of genes undergoing alternative splicing: 27 % Proportion of genes undergoing alternative polyadenylation: 85 % Proportion of genes undergoing alternative transcription_start_sites: 5 % ASTD

  18. Graph of human growth ASTD

  19. Controlled vocabularies/ontologies • GO • SOFA • eVOC • Splice event ontology • MeSH terms ASTD

  20. Future … 1 • Addition of new species • Experimental validation of transcript structure and alternative poly(A)s • Use EMBL CDS as another source of alignments to the genome • More frequent releases – every 3 months • Addition of regulatory motifs – ESS, ESE, ISS and ISE • microRNA target sites from the EURASNET NoE (University Basel) ASTD

  21. Future … 2 • Introduction of unique identifiers means: • Addition as xref in EMBL so transcripts in the INSDC can be grouped into one gene • Addition into UniParc so translations can be linked to UniProt IsoIds and again grouped as being variants of one gene • UniParc translations also undergo full InterPro scan, TM and SignalP predictions so data is precomputed and not done on the fly ASTD

  22. Future … 3 • The EBI sequence database group and Ensembl have merged making the Hinxton Sequencing Forum (HSF) • Outcome is that ASTD will be vehicle to augment the Ensembl transcript views • Full length transcripts with TSS, splice events and polyA • Definition of the ‘major transcript set’ using annotation of features to transcripts, eg expression state, exon array, splice junction array evidence etc • VEGA/Havana annotations also included • Time scale - within 2 years ASTD

  23. Acknowledgements • The ASTD Team: • Gautier Koscielny • Vincent Le Texier • Eleanor Whitfield • Chellapa Gopalakrishnan • Vasudev Kumanduri • Sequence Database Group and External Services • ASD consortium (Stefan Stamm for AEdb) • ATD consortium (Daniel Gautheret for AltPAS) • EURASNET consortium • The ASTD project at EBI is supported by a grant from the EC: Eurasnet Network of Excellence (LSHG-CT-2005-518238). ASTD

More Related