530 likes | 671 Views
Ensembl Training. Xos é Mª Fernández European Bioinformatics Institute. Sept 2008. April 2006. Ensembl Training. Ensembl What can we offer EBI Training Logistics. Ensembl What can we offer EBI Training Logistics. Ensembl - Project. Joint project
E N D
Ensembl Training Xosé Mª Fernández European Bioinformatics Institute Sept 2008 April 2006
Ensembl Training • Ensembl • What can we offer • EBI Training • Logistics • Ensembl • What can we offer • EBI Training • Logistics
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Current Status 21,916 protein-coding genes in the human genome (e! 50) with additional segments ‘predicted’ to be protein-coding genes NCBI34 NCBI35 NCBI36
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Anopheles gambiae Aedes aegypti Drosophila melanogaster Dasypus novemcinctus Loxodonta africana Echinops telfairi Tupaia belangeri Homo sapiens Pan troglodytes Macaca mulatta Otolemur garnettii Mus musculus Rattus norvegicus Spermophilus tridecemlineatus Cavia porcellus Oryctolagus cuniculus Erinaceus europaeus Myotis lucifugus Canis familiaris Felis catus Bos taurus Monodelphis domestica Ornithorhynchus anatinus Gallus gallus Xenopus tropicalis Gasterosteus aculeatus Oryzias latipes Takifugu rubripes Tetraodon nigroviridis Danio rerio Ciona intestinalis Ciona savignyi Caenorhabditis elegans Saccharomyces cerevisiae The era of sequencing genomes
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
Ensembl - Project • Joint project • EMBL – European Bioinformatics Institute (EBI) • Wellcome Trust Sanger Institute • Produce accurate, automatic genome annotation • Focused on selected eukaryotic genomes • Integrate external (distributed) biological data • Presentation of the analysis to all via the Web at http://www.ensembl.org • Open distribution of the analysis the community • Development of open, collaborative software (databases and APIs)
How can we help? • We want to know how we can help you to make the most of Ensembl: • Workshops to train users • What data do you use (e.g. Clinical cytogeneticists use Ensembl to design FISH probes, exploring adding additional DAS tracks) • Help you sharing information using DAS • Publish ‘case study’ or ‘protocol’ papers in journals widely used by the community • Attend conferences with hands-on sessions • Share bookmarks and configurations by setting up groups (with specific profiles for clinical molecular geneticists)
1800 bps in chr 11… ............................cccgtggagccacaccctagggttggccaatc tactcccaggagcagggagggcaggagccagggctgggcataaaagtcagggcagagcca tctattgcttgcaggagccagggctgggcataaaagtcagggcagagccatctattgctt ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATC TGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG TTGGTGGTGAGGCCCTGGGCAGGGTTGGTATCAAGGTTACAAGACAGGTTTAAGGAGACC AATAGAAACTGGGCATGTGGAGACAGAGAAGACTCTTGGGTTTCTGATAGGCACTGACTC TCTCTGCCTATTGGTCTATTTTCCCACCCTTAGCTGCTGGTGGTCTACCCTTGGACCCAG AGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAG GTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGgtgagtctatgggacgcttgatgttttctttccccttcttttcta tggttaagttcatgtcataggaaggggataagtaacagggtacagtttagaatgggaaac agacgaatgattgcatcagtgtggaagtctcaggatcgttttagtttcttttatttgctg ttcataacaattgttttcttttgtttaattcttgctttctttttttttcttctccgcaat ttttactattatacttaatgccttaacattgtgtataacaaaaggaaatatctctgagat acattaagtaacttaaaaaaaaactttacacagtctgcctagtacattactatttggaat atatgtgtgcttatttgcatattcataatctccctactttattttcttttatttttaatt gatacataatcattatacatatttatgggttaaagtgtaatgttttaatatgtgtacaca tattgaccaaatcagggtaattttgcatttgtaattttaaaaaatgctttcttcttttaa tatacttttttgtttatcttatttctaatactttccctaatctctttctttcagggcaat aatgatacaatgtatcatgcctctttgcaccattctaaagaataacagtgataatttctg ggttaaggcaatagcaatatctctgcatataaatatttctgcatataaattgtaactgat gtaagaggtttcatattgctaatagcagctacaatccagctaccattctgcttttatttt atggttgggataaggctggattattctgagtccaagctaggcccttttgctaatcatgtt catacctcttatcttcctcccacagCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA TCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGG TGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCT ATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCC TTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGCaatgatgtatttaa attatttctgaatattttactaaaaagggaatgtgggaggtcagtg.............. ............................cccgtggagccacaccctagggttggccaatc tactcccaggagcagggagggcaggagccagggctgggcataaaagtcagggcagagcca tctattgcttgcaggagccagggctgggcataaaagtcagggcagagccatctattgctt acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatc tgactcctgaggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaag ttggtggtgaggccctgggcagggttggtatcaaggttacaagacaggtttaaggagacc aatagaaactgggcatgtggagacagagaagactcttgggtttctgataggcactgactc tctctgcctattggtctattttcccacccttagctgctggtggtctacccttggacccag aggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaag gtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggac aacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggat cctgagaacttcagggtgagtctatgggacgcttgatgttttctttccccttcttttcta tggttaagttcatgtcataggaaggggataagtaacagggtacagtttagaatgggaaac agacgaatgattgcatcagtgtggaagtctcaggatcgttttagtttcttttatttgctg ttcataacaattgttttcttttgtttaattcttgctttctttttttttcttctccgcaat ttttactattatacttaatgccttaacattgtgtataacaaaaggaaatatctctgagat acattaagtaacttaaaaaaaaactttacacagtctgcctagtacattactatttggaat atatgtgtgcttatttgcatattcataatctccctactttattttcttttatttttaatt gatacataatcattatacatatttatgggttaaagtgtaatgttttaatatgtgtacaca tattgaccaaatcagggtaattttgcatttgtaattttaaaaaatgctttcttcttttaa tatacttttttgtttatcttatttctaatactttccctaatctctttctttcagggcaat aatgatacaatgtatcatgcctctttgcaccattctaaagaataacagtgataatttctg ggttaaggcaatagcaatatctctgcatataaatatttctgcatataaattgtaactgat gtaagaggtttcatattgctaatagcagctacaatccagctaccattctgcttttatttt atggttgggataaggctggattattctgagtccaagctaggcccttttgctaatcatgtt catacctcttatcttcctcccacagctcctgggcaacgtgctggtctgtgtgctggccca tcactttggcaaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctgg tgtggctaatgccctggcccacaagtatcactaagctcgctttcttgctgtccaatttct attaaaggttcctttgttccctaagtccaactactaaactgggggatattatgaagggcc ttgagcatctggattctgcctaataaaaaacatttattttcattgcaatgatgtatttaa attatttctgaatattttactaaaaagggaatgtgggaggtcagtg..............
CAP Enhancer Promotor Poli(A) 1800bps in chr 11…
Compara Multiple Alignments Constrained elements Syntenies
Functional Genomics Integrates diverse genome-wide functional and epigenetic data to annotate the active genome Components Ensembl Regulatory Build ChIP-seq analysis DNA Methylation resources External collaborative projects
ENCODEProviding a map of the genome Pilot project completed in 2007: 1% of human genome Assessevery possible computational and experimental experiment • Comparative genomics, sequencing, expression, ChIP-chip, etc. Summary of results: • Majority of human bases are transcribed • Identification of many novel non-protein-coding transcripts • Identification of transcription start sites • Deciphering enhancer and regulatory regions of the genome • Regulatory elements are on either side of the transcription start site • Chromatin accessibility and histone modification patterns are very predictive of presence and activity of transcription start sites • DNA replication timing correlates with chromatin structure ENCODE2 (started 2008) extended to 100% of genome
Gene concept post-ENCODE Gene as discrete unit • Union of genomic sequences encoding a coherent set of potentially overlapping functional products. • Statistical model to help interpret and provide concise summarisation to potentially noisy experimental data.
SNPs in Ensembl • GeneSNPView • Gene Variation Report • Variations in region of gene • Variations and consequences
Jim, Craig, YanHuang No 1, Marjolein… Jimomevs Craigome Craig Venter: • Sequence & analysis since 2003 • 32 mill seq (20 billion bp) • More variability than anticipated Jim Watson: • 454 technology (7.4x) • 100 mill unpaired reads (25 billion bps) • $1,000,000 “The Diploid Genome Sequence of an Individual Human” PLoS Biology 5: 10 2113-2144 (2007) “The Complete Genome of an Individual by Massively Parallel DNA Sequencing” Nature452:872-876 (2008)
Spot the difference • Venter TTCTTCATTGGGCCGAACTTTCTGGTCCTCATCCAACAGCTCTTCTATCAYGTGTTCGAAAGTGTCAGCCAATGATGTCAAGCCTCTTGAACCTGCCTTGGGCCCATTCACGCTCTCCAGAGTCCCATGGGTCCGCACACCTGGGTAGGCCAAGCCACCTTGTCCTCGGATGTTTGCTTCTTTCATGGGGGCAGCCTTCATGCAACCAAAGTATGAAATAACCATAGTAAGGAAAAGGATGGTCATCACTCTTCTCACCTGGTGGAACTGTAGGGAGAAAGCAGAAACAAGACAGAAAACTGGTTAGGGCTTTCTTTCACCGGGATGCCATGTGGCCCATCTGATTGTAATTCCAGGCCATTCT • Watson TTCTTCATTGGGCCGAACTTTCTGGTCCTCATCCAACAGCTCTTCTATCATGTGTTCGAAAGTGTCAGCCAATGATGTCAAGCCTCTTGAACCTGCCTTGGGCCCATTCACGCTCTCCAGAGTCCCATGGGTCCGCACACCTGGGTAGGCCAAGCCACCTTGTCCTCGGATGTTTGCTTCTTTCATGGGGGCAGCCTTCATGCAACCAAAGTATGAAATAACCATAGTAAGGAAAAGGATGGTCATCACTCTTCTCACCTGGTGGAACTGTAGGGAGAAAGCAGAAACAAGACAGAAAACTGGTTAGGGCTTTCTTTCACCGGGATGCCATGTGGCCCATCTGATTGTAATTCCAGGCCATTCT • Watson TTCTTCATTGGGCCGAACTTTCTGGTCCTCATCCAACAGCTCTTCTATCATGTGTTCGAAAGTGTCAGCCAATGATGTCAAGCCTCTTGAACCTGCCTTGGGCCCATTCACGCTCTCCAGAGTCCCATGGGTCCGCACACCTGGGTAGGCCAAGCCACCTTGTCCTCGGATGTTTGCTTCTTTCATGGGGGCAGCCTTCATGCAACCAAAGTATGAAATAACCATAGTAAGGAAAAGGATGGTCATCACTCTTCTCACCTGGTGGAACTGTAGGGAGAAAGCAGAAACAAGACAGAAAACTGGTTAGGGCTTTCTTTCACCGGGATGCCATGTGGCCCATCTGATTGTAATTCCAGGCCATTCT • Venter TTCTTCATTGGGCCGAACTTTCTGGTCCTCATCCAACAGCTCTTCTATCAYGTGTTCGAAAGTGTCAGCCAATGATGTCAAGCCTCTTGAACCTGCCTTGGGCCCATTCACGCTCTCCAGAGTCCCATGGGTCCGCACACCTGGGTAGGCCAAGCCACCTTGTCCTCGGATGTTTGCTTCTTTCATGGGGGCAGCCTTCATGCAACCAAAGTATGAAATAACCATAGTAAGGAAAAGGATGGTCATCACTCTTCTCACCTGGTGGAACTGTAGGGAGAAAGCAGAAACAAGACAGAAAACTGGTTAGGGCTTTCTTTCACCGGGATGCCATGTGGCCCATCTGATTGTAATTCCAGGCCATTCT
TrancriptSNPView • SNP in different strains • Variations and consequences • Individual genotypes • Variations in region of gene
HapMap “The International HapMap Project “Nature426, 789 - 796 (18 Dec 2003)
European Genotype Archive http://www.ebi.ac.uk/ega/
Ensembl Training • Ensembl • What can we offer • EBI Training • Logistics
Literature and ontologies CitExplore, GO Databases at EBI Nomenclature HGNC Nomenclature HGNC Genomes Ensembl, Integr8 Genomes Ensembl, Integr8 Nucleotide sequence EMBL Archive Nucleotide sequence EMBL Archive Proteomes UniProt, PRIDE Proteomes UniProt, PRIDE Gene expression ArrayExpress Protein structure ePDB Protein families, motifs and domains InterPro Protein families, motifs and domains InterPro Chemical entities ChEBI Chemical entities ChEBI Protein interactions IntAct Protein interactions IntAct Pathways Reactome Pathways Reactome Systems BioModels
A tripartite user-training programme Training any time, anywhere, at any pace Training comes to you Hands-on user training on all our core data resources for lab-based researchers
Interactive training for all levels of experience • Hands-on training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge • Learn from the EBI’s experts through a combination of talks and practical exercises • Take a two-day tour of all our core data resources, or focus in on specific data types • Full programme at www.ebi.ac.uk/training/handson
2008 2009 Coming up in our Hands-on Training 6–8 October A two-day dip into the EBI’s resources 24–27 November Programmatic access in Java: webservices and workflows Transcriptomics resources and data analysis 19–22 January 23–26 February Bioinformatics resources for protein structure Sequence to genes: genome informatics 16–18 March 27–29 April Programmatic access to biological databases 11–15 May A walk through EBI Bioinformatics Resources
The Bioinformatics Roadshow • Supported under the EU Integrated Infrastructures Initiative FELICS (www.felics.org) • FELICS provides access to many of Europe’s most widely used data resources: EBI, Swiss Institute of Bioinformatics, BRENDA, and the European Patent Office • We provide hands-on training in a wide variety of data resources and tools, where you want it, when you want it and targeted to your organization’s needs • For more information see www.ebi.ac.uk/training/roadshow or e-mail copeland@ebi.ac.uk
eLearning platform Courses available Ensembl Sequence searching Courses under development ArrayExpress UniProt MSD/PDBe PRIDE Gene Ontology Literature searching and mining Patent searching
Each course is modular A course contains 3–5 modules (~30 min each) Modules contain… Video tutorial learn by watching and listening Print tutorial Learn by reading Quiz Learn by testing your understanding Reflective task Learn by practicing
Roadshow modules Genomes Ensembl, EMBL-Bank Structures MSD, PDBSum, ProFunc Transcriptomes ArrayExpress, Expression Profiler Proteomes UniProt, InterPro, IntAct, PRIDE, OLS Mini modules Web services; BioMart; SRS; Chemistry GO/GOA; Alignments; Literature Pathways Reactome BioModels
Ensembl Training • Ensembl • What can we offer • EBI Training • Logistics
Workshops 2007-2008 UK 32 Belgium 5 US 13 (+) Kenya 3 Germany 7 South Africa 3(+) Netherlands 6 Spain 2 Portugal 6 Norway 2 China 5