340 likes | 694 Views
Genomes with Ensembl. Dr. Giulietta M. Spudich European Bioinformatics Institute Hinxton, UK. Today. Introduction to the Ensembl project Walk-through of the browser BioMart Variation Comparative Genomics. Introduction to Ensembl. Why do we have genome browsers? Why Ensembl?
E N D
Genomes with Ensembl Dr. Giulietta M. Spudich European Bioinformatics Institute Hinxton, UK
Today • Introduction to the Ensembl project • Walk-through of the browser • BioMart • Variation • Comparative Genomics
Introduction to Ensembl Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Help and tutorials
Histone modification DNase I sensitive site Conserved sequence Gene Allele Genome browsers provide a map Figure adapted from the EnCODE project www.nature.com/nature/focus/encode/
Genome Browsers • Ensembl Genome browser http://www.ensembl.org • NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ • UCSC Genome Browser http://genome.ucsc.edu
What Distinguishes Ensembl from the UCSC and NCBI Browsers? • The gene set. Automatic annotation based on mRNA and protein information. • Programmatic access via the Perl API (open source) • BioMart • Integration with other databases (DAS) • Comparative analysis (gene trees)
Subjects Why do we have genome browsers? Why Ensembl? How can we extract data from Ensembl? Where can I find help?
To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality • Started in 2000 • Joint project between EBI and Sanger • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC
Vertebrates are available Extension to other genomes: Plants, Microorganisms,… www.ensemblgenomes.org Non-chordates: D. melanogaster C. elegans S. cerevisiae
: Extending Ensembl across the taxonomic space Archaea 48 Chordates including: Human Mouse Zebrafish Chicken Chimpanzee Pig Platypus 21 species Drosophila (12) Caenorhabditis (5) Anopheles gambiae 8 species Arabidopsis thaliana Arabidopsis lyrata Oryzasativa • 8 Aspergillums • 2 yeast • S.cerevisiae • S.pombe Eukaryota • 134 species • 6 bacterial clades • 1 prokaryotic clades 3 Plasmodia falciparum knowlesi vivax Bacteria Slide design by Jeff Almeida-King F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork. Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March 2006.
Exploring genomes • Vertebrates focus: www.ensembl.org • Other species: www.ensemblgenomes.org
Subjects Why do we have genome browsers? Why Ensembl? Ensembl (vertebrate) genes & genomes Help and tutorials
What is known? Genomic assemblies from sequencing consortia
What is known? Proteins and cDNA/mRNA sequences from the research community found in: • UniProt/Swiss-Prot (manually curated) • UniProt/TrEMBL www.uniprot.org • NCBI RefSeq (manually curated) www.ncbi.nlm.nih.gov/RefSeq
Exon Exon Exon Coding Untranslated Untranslated+Coding Combining genes and genomes …tgcctgttag...
Genome Aligned cDNA and protein Exon Exon Exon Coding Untranslated Untranslated+Coding Too many pieces…
Ensembl shows one transcript with underlying evidence
VEGA/Havana • Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl • Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana
HAVANA http://www.sanger.ac.uk/HGP/havana/
Genes and Transcripts in Ensembl • Ensembl known transcripts • Ensemblnovel transcripts • Ensembl merged transcripts (Havana) • EST clusters • More manual curation (SGD, WormBase, FlyBase)
Ensembl/Havana • Transcripts are labelled: Ensembl Havana Ensembl/Havana merge
Names in Ensembl • ENSG### Ensembl Gene ID • ENST### Ensembl Transcript ID • ENSP### Ensembl Peptide ID • ENSE### Ensembl Exon ID • For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc.
Low-coverage genomes • High-coverage sequencing is time-consuming and expensive • BAC clones (>10x): Human, Mouse, Zebrafish • Whole Genome Shotgun (6x): Chimp, Rat, Chicken,... • Low (~2x) coverage genome sequencing • Faster, cheaper, but only useful when annotated • Assembled into lots of “scaffolds” • “Classic” Ensembl gene-build would result in many partial and fragmented genes
Low-Coverage Gene-Build Whole Genome Alignment to an annotated high-quality reference genome Guided re-ordering of scaffolds Annotation of longer, more complete gene structures
NNNNNN 2X Genebuild Human gene Human genome Cat scaffold 2 Cat scaffold 1 Human or dog gene (projected)
What other annotation? • Non-coding (nc)RNAs • IDs in other databases • microarray probes, clonesets, BAC maps • Other features of the genome: • repeats, CpG islands • Comparative data: • orthologues and paralogues, protein families, whole genome alignments, syntenic regions • Variation data: • SNPs, InDels • Regulatory data (a first guess at promoter and enhancer elements) • Data from external sources (DAS)
Sources of Variation • NCBI dbSNP • Import: alleles, flanking sequence, frequencies, • Calculate: position, transcript effect • http://www.ncbi.nlm.nih.gov/SNP/ • For human also: • HGVbase • Affy GeneChip 100K and 500K Mapping Array • Affy Genome-Wide SNP array 6.0 • Ensembl-called SNPs (from Celera reads and Jim Watson’s and Craig Venter’s genomes) • For mouse, rat, dog and chicken also: • Sanger- and Ensembl-called SNPs (other strains / breeds) • STAR Project for rat, other projects
External Sources Large-scale variations in… DECIPHER • Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources DGV loci • Database of Genomic Variants • CNVs, Inversions, InDels
Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Help and tutorials
How is this information organised? • Ensembl Views (Website) • Ensembl Database (open source) • BioMart ‘DataMining tool’
Help and Information • Comments and questions? helpdesk@ensembl.org • Check out our tutorials page: www.ensembl.org/info/website/tutorials/index.html • Videoshttp://www.youtube.com/user/EnsemblHelpdesk • Mailing list ensembl-announce@ebi.ac.uk • Come visit our blog!http://ensembl.blogspot.com/ • FTP site: ftp://ftp.ensembl.org • Amazon Web Services: http://aws.amazon.com/publicdatasets