470 likes | 616 Views
How to access genomic information using Ensembl. Damian Smedley and Xos é Fern á ndez Ensembl Project European Bioinformatics Institute Cambridge, UK. November 2004. Schedule. Today Introduction to the Ensembl system Hands-on examples to introduce the system
E N D
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004
Schedule Today Introduction to the Ensembl system Hands-on examples to introduce the system Evaluating genes and transcripts Variation in Ensembl (SNPs, haplotypes) Tomorrow Data mining with EnsMart Comparative genomics and proteomics in Ensembl BioMart Advanced topics (Upload your own data, DAS)
From 325,109 initial contigs to 26,720 overlapping clones Assembly Other ordering data non-redundant, “virtual contig” view
BACs bacterial artificial chromosomes avg size 150 kb fragment Shizuya et al 1992 Dib et al 1996 Deloukas et al 1998 Osoegawa et al 2001 map WGS sequence assembly fragment Bentley et al 2001 Bruls et al 2001 McPherson et al 2001 Montgomery et al 2001 Tilford et al 2001 draft pUCs avg size 2-4 kb finished BAC Mapping and Sequencing the human genome
Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements (eg Alpha satellite, Alu repeats) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb
Human genome: Current status • 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes • 1183 genes ‘were born’ in the last 60-100 My • ~ 30 genes ‘died’ in a similar time period Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)
Ensembl - project aims • funded to provide metazoan genomes to the world • aims to provide the world’s best automated genome annotation • a leading group for human and mouse analysis • all software, data and results freely available
Ensembl - project background • group split between EBI and Sanger • mainly Wellcome Trust funded • largest dedicated compute in biology in Europe • developer community > 100 people, including companies
Ensembl – Open source • Freely-available • Community development. • >51 Ensembl installs worldwide. • Both public and commercial, • e.g. Gramene (CSHL) • Fugu-sg (ICMB) • Ciona-sg (Temasek)
SNP Manual Annotation Ensembl Supporting Databases Final DB Analysis DB CPU
Genome browsingwhy present the whole genome? • Explore what is in a chromosome region • See features in and around a specific gene • Search & retrieve across the whole genome • Investigate genome organization • Compare to other genomes
http://www.ensembl.org http://www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu Genome browsers • Ensembl – public site + installable system • UCSC Human Genome Browser • NCBI Map Viewer
Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 34, mouse, rat, Fugu,mosquito adds annotation and links automated process presents all the data on a web site
Annotation: genes Known genes Novelgenes • how to predict? • require evidence • transcripts(s)? • protein(s)? • orthologues? • attach useful links • where? • genomic structure? • transcripts(s)? • protein(s)? • orthologues? • attach useful links
Annotation: other features • markers and SNPs • cytogenetic bands • repeated sequences • ESTs & other sequence records where do they show sequence similarity? • regions homologous to other species
Species homepage Site map Map View Text search BLAST SSAHA Disease View How to get started … …
AnchorView MapView
Regions, maps and markers ContigView CytoView SyntenyView MultiContigView MarkerView SNPView
ContigViewclose-up Customising & short cuts Transcripts red & black (Ensembl predictions) Blue (Vega) Evidence Pop-up menu
ContigView - Chromosome 20 close-up Manual annotation via Vega Forward strand Ensembl predictions Reverse strand Ensembl EST-based predictions Other chromosomes with manual annotation from http://vega.sanger.ac.uk:6, 7, 9, 10, 13, 14, 20, 22, X
MarkerView SNPView
Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView
ExonView TransView
Data retrieval EnsMart Export View Data sets on ftp site MySQL queries of databases Perl API access to databases
Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs BACs are shown in CytoView (FPC map), but for most no sequence is available Mouse differences
context sensitive help pages - click access other documentation via generic home page email the helpdesk Help! HelpDesk / Suggestions
Thanks Ensembl Team
Ensembl Team November 2004