130 likes | 163 Views
Ensembl. Genome Repository. Main Data Repositories. Ensembl- BLAST or BLAT UCSC - BLAT NCBI (Entrez) - BLAST Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI. Ensembl. Provide automatic annotation of sequenced genomes Integrate with biological data
E N D
Ensembl Genome Repository
Main Data Repositories • Ensembl- BLAST or BLAT • UCSC - BLAT • NCBI (Entrez) - BLAST • Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI
Ensembl • Provide automatic annotation of sequenced genomes • Integrate with biological data • Make available from web • Genome Browser • Web interface • BioMart • Direct database access Perl API
Outline • Where the data comes from • Questions that can be answered
Genome Annotation • Identify elements on the genome • Attach biological information to the elements • Automatic annotation and curation Vega/Havana
Annotation • Addition of positional, functional, regulatory and evolutionary datasets to a raw assembled genome. • Genes, exon-intron boundaries, protein products, miRNAs, alternative splicing, transcriptional start sites, expression,orthologs, paralogs, repeats, structural features, syntenic relationships, ChIP-chip data ... • Based on experimental data and computational predictions.
Genebuild • Align species-specific proteins to the genome to create CDS models (targeted build) • Align proteins from closely related species to locate additional CDS models (similarity build) • Add UTRs using cDNA/EST evidence and ditag data • Cluster transcripts into genes • Classify transcripts • Name genes
Human/Mouse Genebuild • additional steps not included in the standard Ensembl build. • For both species, transcripts from the Consensus Coding Sequence (CCDS) set are imported directly and not altered by the genebuild process. • In addition, where manual curation is available for a transcript, the Ensembl and HAVANA transcript models are compared. • The Ensembl and HAVANA models are merged when they agree on the same coding sequence
Ensembl Identifiers • ENS_Species_Type_00000_ID • Species: blank for human for all other species a three letter code (MUS - mouse) • Type: G (gene), T (transcript), P (protein) • ID: six-digit number • ENSMUST00000118022 • ENSMUSP00000113891 • ENSMUSG00000021944
Ensembl Organization • Views designed into four classes • Gene • Transcript • Location (Genome Browser) • Variation
Questions • Are there splice variants? • How do I find orthologs and paralogs? • Are there variations in the genomic sequence? • How can I download different parts of the mRNA sequence? • What protein domains exist? • Gene Ontology • Can I download sets of data (DNA, cDNA, protein) for a species? • BioMart question
Resources • Ensembl Tutorials http://www.ensembl.org/info/website/tutorials/index.html • Ensembl 2009 Nucleic Acids Research PMID: 19033362 • Bert Overduin, Ph.D. Ensembl http://www.ebi.ac.uk/~bert/workshops/london_080509/browser_london_080509.pdf