1 / 34

Genomes with Ensembl

Genomes with Ensembl. Dr. Giulietta M. Spudich European Bioinformatics Institute Hinxton, UK. Today. Introduction to the Ensembl project Walk-through of the browser BioMart Variation Comparative Genomics. Introduction to Ensembl. Why do we have genome browsers? Why Ensembl?

emmy
Download Presentation

Genomes with Ensembl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomes with Ensembl Dr. Giulietta M. Spudich European Bioinformatics Institute Hinxton, UK

  2. Today • Introduction to the Ensembl project • Walk-through of the browser • BioMart • Variation • Comparative Genomics

  3. Introduction to Ensembl Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Help and tutorials

  4. Histone modification DNase I sensitive site Conserved sequence Gene Allele Genome browsers provide a map Figure adapted from the EnCODE project www.nature.com/nature/focus/encode/

  5. Genome Browsers • Ensembl Genome browser http://www.ensembl.org • NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ • UCSC Genome Browser http://genome.ucsc.edu

  6. What Distinguishes Ensembl from the UCSC and NCBI Browsers? • The gene set. Automatic annotation based on mRNA and protein information. • Programmatic access via the Perl API (open source) • BioMart • Integration with other databases (DAS) • Comparative analysis (gene trees)

  7. Subjects Why do we have genome browsers? Why Ensembl? How can we extract data from Ensembl? Where can I find help?

  8. To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality • Started in 2000 • Joint project between EBI and Sanger • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC

  9. Vertebrates are available Extension to other genomes: Plants, Microorganisms,… www.ensemblgenomes.org Non-chordates: D. melanogaster C. elegans S. cerevisiae

  10. : Extending Ensembl across the taxonomic space Archaea 48 Chordates including: Human Mouse Zebrafish Chicken Chimpanzee Pig Platypus 21 species Drosophila (12) Caenorhabditis (5) Anopheles gambiae 8 species Arabidopsis thaliana Arabidopsis lyrata Oryzasativa • 8 Aspergillums • 2 yeast • S.cerevisiae • S.pombe Eukaryota • 134 species • 6 bacterial clades • 1 prokaryotic clades 3 Plasmodia falciparum knowlesi vivax Bacteria Slide design by Jeff Almeida-King F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork. Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March 2006.

  11. Exploring genomes • Vertebrates focus: www.ensembl.org • Other species: www.ensemblgenomes.org

  12. Subjects Why do we have genome browsers? Why Ensembl? Ensembl (vertebrate) genes & genomes Help and tutorials

  13. What is known? Genomic assemblies from sequencing consortia

  14. What is known? Proteins and cDNA/mRNA sequences from the research community found in: • UniProt/Swiss-Prot (manually curated) • UniProt/TrEMBL www.uniprot.org • NCBI RefSeq (manually curated) www.ncbi.nlm.nih.gov/RefSeq

  15. Exon Exon Exon Coding Untranslated Untranslated+Coding Combining genes and genomes …tgcctgttag...

  16. Genome Aligned cDNA and protein Exon Exon Exon Coding Untranslated Untranslated+Coding Too many pieces…

  17. Ensembl shows one transcript with underlying evidence

  18. VEGA/Havana • Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl • Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana

  19. HAVANA http://www.sanger.ac.uk/HGP/havana/

  20. Genes and Transcripts in Ensembl • Ensembl known transcripts • Ensemblnovel transcripts • Ensembl merged transcripts (Havana) • EST clusters • More manual curation (SGD, WormBase, FlyBase)

  21. Ensembl/Havana • Transcripts are labelled: Ensembl Havana Ensembl/Havana merge

  22. Names in Ensembl • ENSG### Ensembl Gene ID • ENST### Ensembl Transcript ID • ENSP### Ensembl Peptide ID • ENSE### Ensembl Exon ID • For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc.

  23. Low-coverage genomes • High-coverage sequencing is time-consuming and expensive • BAC clones (>10x): Human, Mouse, Zebrafish • Whole Genome Shotgun (6x): Chimp, Rat, Chicken,... • Low (~2x) coverage genome sequencing • Faster, cheaper, but only useful when annotated • Assembled into lots of “scaffolds” • “Classic” Ensembl gene-build would result in many partial and fragmented genes

  24. Some 2X genomes

  25. Low-Coverage Gene-Build Whole Genome Alignment to an annotated high-quality reference genome Guided re-ordering of scaffolds Annotation of longer, more complete gene structures

  26. NNNNNN 2X Genebuild Human gene Human genome Cat scaffold 2 Cat scaffold 1 Human or dog gene (projected)

  27. What other annotation? • Non-coding (nc)RNAs • IDs in other databases • microarray probes, clonesets, BAC maps • Other features of the genome: • repeats, CpG islands • Comparative data: • orthologues and paralogues, protein families, whole genome alignments, syntenic regions • Variation data: • SNPs, InDels • Regulatory data (a first guess at promoter and enhancer elements) • Data from external sources (DAS)

  28. Sources of Variation • NCBI dbSNP • Import: alleles, flanking sequence, frequencies, • Calculate: position, transcript effect • http://www.ncbi.nlm.nih.gov/SNP/ • For human also: • HGVbase • Affy GeneChip 100K and 500K Mapping Array • Affy Genome-Wide SNP array 6.0 • Ensembl-called SNPs (from Celera reads and Jim Watson’s and Craig Venter’s genomes) • For mouse, rat, dog and chicken also: • Sanger- and Ensembl-called SNPs (other strains / breeds) • STAR Project for rat, other projects

  29. External Sources Large-scale variations in… DECIPHER • Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources DGV loci • Database of Genomic Variants • CNVs, Inversions, InDels

  30. Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Help and tutorials

  31. How is this information organised? • Ensembl Views (Website) • Ensembl Database (open source) • BioMart ‘DataMining tool’

  32. Help and Information • Comments and questions? helpdesk@ensembl.org • Check out our tutorials page: www.ensembl.org/info/website/tutorials/index.html • Videoshttp://www.youtube.com/user/EnsemblHelpdesk • Mailing list ensembl-announce@ebi.ac.uk • Come visit our blog!http://ensembl.blogspot.com/ • FTP site: ftp://ftp.ensembl.org • Amazon Web Services: http://aws.amazon.com/publicdatasets

  33. Ensembl Team

  34. The Wellcome Trust Genome Campus

More Related