1 / 31

Dr. Giulietta M. Spudich European Bioinformatics Institute

The Ensembl Browser. Dr. Giulietta M. Spudich European Bioinformatics Institute. Today. Introduction to the Ensembl project and gene set Walk-through of the browser Hands-on Browser BioMart Lunch BioMart Hands-on Comparative Genomics + Hands-on

MikeCarlo
Download Presentation

Dr. Giulietta M. Spudich European Bioinformatics Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Ensembl Browser Dr. Giulietta M. Spudich European Bioinformatics Institute

  2. Today Introduction to the Ensembl project and gene set Walk-through of the browser Hands-on Browser BioMart Lunch BioMart Hands-on Comparative Genomics + Hands-on Variations &Functional Genomics + Hands-on

  3. Course Objectives How to browse information about a gene How to choose a transcript Where to find sequence variations How to view multiple alignments How to use BioMart Where to go for help

  4. Introduction to Ensembl Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

  5. Histone modification DNase I sensitive site Conserved sequence Gene Allele Genome browsers provide a map Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/

  6. Genome Browsers • Ensembl Genome browser http://www.ensembl.org • NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ • UCSC Genome Browser http://genome.ucsc.edu

  7. Ensembl Features • The gene set. Automatic annotation based on mRNA and protein information plus manual annotation (GENCODE set). • BioMart (data export tool) • Comparative analysis (gene trees) • Variation and functional genomics • Integration with other databases (DAS) • Programmatic access via the Perl API (open source)

  8. Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

  9. To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality • Started in 2000 • Joint project between EBI and Sanger • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC

  10. Genome annotation Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1. Identifying genes on the genome. 2. Attaching biological information to genes and the genome. (For example, effects of sequence variation).

  11. Ensembl Annotates Vertebrate Genomes 50 species including: Non-chordates: D. melanogaster C. elegans S. cerevisiae

  12. : Extending Ensembl across the taxonomic space 48 Chordates including: Human Mouse Zebrafish Chicken Chimpanzee Pig Platypus 21 species Drosophila (12) Caenorhabditis (5) Anopheles gambiae 8 species Arabidopsis thaliana Arabidopsis lyrata Oryzasativa • 8 Aspergillums • 2 yeast • S.cerevisiae • S.pombe • 134 species • 6 bacterial clades • 1 prokaryotic clades 3 Plasmodia falciparum knowlesi vivax Slide design by Jeff Almeida-King F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork. Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March 2006. 12 of 35 12 of 49

  13. Exploring genomes • Vertebrates focus: www.ensembl.org • Other species: www.ensemblgenomes.org

  14. Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

  15. What is known? Genomic assemblies from sequencing consortia

  16. What is known? Proteins and cDNA/mRNA sequences from the research community found in: • UniProtKB/Swiss-Prot (manually curated) • UniProtKB/TrEMBL www.uniprot.org • NCBI RefSeq (manually curated) www.ncbi.nlm.nih.gov/RefSeq Note: See pages 55 and 56 of the course booklet

  17. Exon Exon Exon Coding Untranslated Untranslated+Coding Combining genes and genomes …tgcctgttag...

  18. Genome Aligned cDNA and protein Exon Exon Exon Coding Untranslated Untranslated+Coding Too many pieces…

  19. Ensembl shows one transcript with underlying evidence

  20. Ensembl Compared with Swiss-Prot and NCBI RefSeq sequences

  21. Is there any consensus? • NCBI RefSeq set ≠ UniProt set • Ensembl combines these sets • UCSC has it’s own gene set How do we come up with a consensus gene set between all these?

  22. CCDS • Reaching a consensus coding sequence set for human and mouse. • 19,851 (ENS human), 17,679 (ENS mouse) (*as of Sept 2009) • If you see a “CCDS ID”, the coding sequence is agreed upon. Genome Res. 2009 Jul;19(7):1316-23. Epub 2009 Jun 4

  23. VEGA/Havana • Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl • Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana

  24. Genes and Transcripts in Ensembl High Quality: • CCDS transcripts • Ensembl/Havana merged transcripts

  25. Ensembl/Havana • Transcripts are from: Ensembl Havana Ensembl/Havana merge

  26. Gene Names in Ensembl • ENSG### Ensembl Gene ID • ENST### Ensembl Transcript ID • ENSP### Ensembl Peptide ID • ENSE### Ensembl Exon ID • For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc.

  27. How is all this information organised? • Ensembl Views (Website) • Ensembl Database (open source) • BioMart ‘DataMining tool’

  28. What other annotation? • Non-coding (nc)RNAs • IDs in other databases • microarray probes, clonesets, BAC maps • Other features of the genome: • repeats, CpG islands • Homologs and whole genome alignments: • orthologues and paralogues, protein families, syntenic regions • Variation data: • Single Nucleotide Polymorphisms, InDels, CNVs • Regulatory data (a first guess at promoter and enhancer elements) • Data from external sources (DAS)

  29. Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help?

  30. Help and Information • Comments and questions? helpdesk@ensembl.org • Check out our tutorials page: www.ensembl.org/info/website/tutorials/index.html • Videoshttp://www.youtube.com/user/EnsemblHelpdesk • Mailing list ensembl-announce@ebi.ac.uk • Come visit our blog!http://ensembl.blogspot.com/ • FTP site: ftp://ftp.ensembl.org • Amazon Web Services: http://aws.amazon.com/publicdatasets

  31. Ensembl Team Ensembl’s 10th Year Nucleic Acids Res. 2010 http://www.ncbi.nlm.nih.gov/pubmed/19906699

More Related