340 likes | 503 Views
How to access genomic information using Ensembl. August 2005. GOAL. Status of the human sequence. finished red / orange ~96% (99.999% accurate) 30-40% repetitive elements ( eg Alpha satellite, Alu repeats ) All known genes, correctly identified (99.74%) heterochromatin
E N D
How to access genomic information using Ensembl August 2005
Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements (eg Alpha satellite, Alu repeats) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb
Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)
SNP Manual Annotation Ensembl Supporting Databases Final DB Analysis DB CPU
Genome browsingwhy present the whole genome? • Explore what is in a chromosome region • See features in and around a specific gene • Search & retrieve across the whole genome • Investigate genome organization • Compare to other genomes
Ensembl – public site + installable system http://www.ensembl.org http://www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu Genome browsers • UCSC Human Genome Browser • NCBI Map Viewer
Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 35, mouse, rat, mosquito… adds annotation and links automated process presents all the data on a web site
Basic Genome Annotation • Genes • Genomic location • Gene model structures • Exons • Introns • UTRs • Transcript(s) • Pseudogenes • Non-coding RNA • Protein(s) • Links to other sources of information
Advanced Genome Annotation • Cytogenetic bands • Polymorphic markers • Sequence Tagged Sites (STS) • Genetic variation • Single Nucleotide Polymorphisms (SNPs) • Deletion-Insertion Polymorphisms (DIPs) • Short Tandem Repeats (STRs) • Repetitive sequences • Expressed Sequence Tags (ESTs) • cDNAs or mRNAs from related species • Regions of sequence homology
Species homepage Map View Text search BLAST SSAHA How to get started … …
See blast hit on genome BLAST and SSAHA
BLAST and SSAHA practical Query sequence: http://genome.imim.es/~nlopez/UVIC/seq.fas Practical: In which chromosome you get the best hit? Explore the alignment of the query sequence with the genome Is this is a sequence of a gene? If so, which one? Explore the region around this sequence
Regions, maps and markers ContigView CytoView SyntenyView MultiContigView MarkerView SNPView GeneSNPView
ContigViewclose-up Transcripts red & black (Ensembl predictions) Blue (Vega) Pop-up menu
ContigView - Chromosome 20 close-up Manual annotation via Vega Ensembl predictions Ensembl EST-based predictions Chromosomes with manual annotation (http://vega.sanger.ac.uk):1, 6, 7, 9, 10, 13, 14, 16, 18, 19, 20, 22, X and Y
Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView
TransView ExonView
Ensembl practical • Type the name of your favorite gene (i.e. BRCA2) and explore all the sections of ensembl for this gene. • Has this gene an ortholog in mouse? • How many different transcript do we know of this gene? • How many exons has the longest transcript? • Which functional annotations has this gene? (hint: check at GO annotations • Can you find SNPs in this gene?