1 / 13

Ensembl

Ensembl. Genome Repository. Main Data Repositories. Ensembl- BLAST or BLAT UCSC - BLAT NCBI (Entrez) - BLAST Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI. Ensembl. Provide automatic annotation of sequenced genomes Integrate with biological data

Download Presentation

Ensembl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ensembl Genome Repository

  2. Main Data Repositories • Ensembl- BLAST or BLAT • UCSC - BLAT • NCBI (Entrez) - BLAST • Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI

  3. Ensembl • Provide automatic annotation of sequenced genomes • Integrate with biological data • Make available from web • Genome Browser • Web interface • BioMart • Direct database access Perl API

  4. Outline • Where the data comes from • Questions that can be answered

  5. Ensembl Genomes

  6. Genome Annotation • Identify elements on the genome • Attach biological information to the elements • Automatic annotation and curation Vega/Havana

  7. Annotation • Addition of positional, functional, regulatory and evolutionary datasets to a raw assembled genome. • Genes, exon-intron boundaries, protein products, miRNAs, alternative splicing, transcriptional start sites, expression,orthologs, paralogs, repeats, structural features, syntenic relationships, ChIP-chip data ... • Based on experimental data and computational predictions.

  8. Genebuild • Align species-specific proteins to the genome to create CDS models (targeted build) • Align proteins from closely related species to locate additional CDS models (similarity build) • Add UTRs using cDNA/EST evidence and ditag data • Cluster transcripts into genes • Classify transcripts • Name genes

  9. Human/Mouse Genebuild • additional steps not included in the standard Ensembl build. • For both species, transcripts from the Consensus Coding Sequence (CCDS) set are imported directly and not altered by the genebuild process. • In addition, where manual curation is available for a transcript, the Ensembl and HAVANA transcript models are compared. • The Ensembl and HAVANA models are merged when they agree on the same coding sequence

  10. Ensembl Identifiers • ENS_Species_Type_00000_ID • Species: blank for human for all other species a three letter code (MUS - mouse) • Type: G (gene), T (transcript), P (protein) • ID: six-digit number • ENSMUST00000118022 • ENSMUSP00000113891 • ENSMUSG00000021944

  11. Ensembl Organization • Views designed into four classes • Gene • Transcript • Location (Genome Browser) • Variation

  12. Questions • Are there splice variants? • How do I find orthologs and paralogs? • Are there variations in the genomic sequence? • How can I download different parts of the mRNA sequence? • What protein domains exist? • Gene Ontology • Can I download sets of data (DNA, cDNA, protein) for a species? • BioMart question

  13. Resources • Ensembl Tutorials http://www.ensembl.org/info/website/tutorials/index.html • Ensembl 2009 Nucleic Acids Research PMID: 19033362 • Bert Overduin, Ph.D. Ensembl http://www.ebi.ac.uk/~bert/workshops/london_080509/browser_london_080509.pdf

More Related