1 / 31

Investigating Genomes with Ensembl

Investigating Genomes with Ensembl. Drs. Bert Overduin and Giulietta Spudich. Overview of the day. Introduction and website walk-through Hands-on exercises (the browser) Tea/Coffee Introduction to BioMart Hands-on exercises (BioMart) Lunch Determining the gene set

kim
Download Presentation

Investigating Genomes with Ensembl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigating Genomes with Ensembl Drs. Bert Overduin and Giulietta Spudich

  2. Overview of the day • Introduction and website walk-through • Hands-on exercises (the browser) Tea/Coffee • Introduction to BioMart • Hands-on exercises (BioMart) Lunch • Determining the gene set • Hands-on exercises (gene set) Tea/Coffee • Variations presentation and hands-on

  3. Introducing… • Genome browsing: a comparison • Consensus genes • Ensembl annotation and software • How to find help

  4. Histone modification DNase I sensitive site Conserved sequence Gene SNP Sequencing the genome

  5. What can we learn about genomes? • Within one genome: regulatory elements, gene order, chromatin structure… • Through comparative studies: Evolution, conserved regions, rearrangements… Gene quality and prediction.

  6. Genome Browsers Today • Ensembl Genome browser http://www.ensembl.org • NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ • UCSC Genome Browser http://genome.ucsc.edu

  7. Ensembl Genome Browser

  8. NCBI Map Viewer

  9. UCSC Genome Browser

  10. What Distinguishes Ensembl from the UCSC and NCBI Browsers? • The gene set. Automatic annotation based on mRNA and protein information. • Programmatic access via the Perl API (open source) • BioMart • Integration with other databases (DAS) • Comparative analysis (gene trees)

  11. Challenges of genome browsers • Increasing sequence information 198,879,188,987 nt (Aug 2007)

  12. Challenges of genome browsers • Increasing annotation: ENCODE • Pilot project completed in 2007: 1% of human genome • Discovered promoter elements are on either side of the transcription start site

  13. To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality • Started in 1999 • Joint project between EBI and Sanger • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC • Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger)

  14. The Ensembl gene set • All Ensembl genes start from a known protein or mRNA Sequence Ensembl Assembly gene set mRNAs protein • An initial alignment of protein and mRNA to the genome • begins the ‘Genebuild’.

  15. Have you heard of… • Ensembl – strives for best possible gene set www.ensembl.org • Havana (VEGA) – same goal http://vega.sanger.ac.uk • HGNC – a unique name and symbol for every gene in human http://www.genenames.org/ • UniProt – focus on proteins, and functional information www.uniprot.org

  16. All genes at once (Ensembl Genebuild) Quick, keeps current Consistent annotation Can apply rules to more species Ensembl vs Havana annotation Gene by gene (Havana/ VEGA) • Flexible, can deal with inconsistencies • Consult publications as well as databases • ‘Out of the Ordinary’ Biology • However… Slow, Expensive

  17. Merging sets • Havana transcripts are incorporated into Ensembl • UniProt proteins are aligned to the genome in the Ensembl genebuild • UniProt imports Ensembl peptides for human • HGNC moved to Hinxton… coordination

  18. Consensus across genome browsers: the CCDS sethttp://www.ensembl.org/info/about/docs/ccds.html • A protein is deposited into the ‘Consensus CDS protein set’ or CCDS set if: NCBI UCSC Havana Ensembl have determined the same sequence.

  19. More about Ensembl… • Genome browsing: a comparison • Consensus genes • Ensembl annotation and software • How to find help

  20. Ensembl Genes – biological basis All Ensembl gene predictions are based on proteins and mRNAs in: • UniProt/Swiss-Prot (manually curated) • UniProt/TrEMBL • NCBI RefSeq (manually curated) Ensembl Genes Protein/ mRNA Sequence Assembly

  21. Genes and Transcripts in Ensembl • Ensemblknowngenes or transcripts • Ensemblnovel genes or transcripts • EnsemblESTgenes or transcripts Non-Ensembl genes: • Imports for yeast, c. elegans, fly, mosquito, takifugu and tetraodon

  22. Names in Ensembl • ENSG###Ensembl Gene ID • ENST###Ensembl Transcript ID • ENSP###Ensembl Peptide ID • ENSE###Ensembl Exon ID • For other species than human a suffix is added: MUS (Mus musculus) for mouse:ENSMUSG### DAR (Danio rerio) for zebrafish:ENSDARG###, etc.

  23. Gene Structure in Ensembl Calmodulin Chicken No UTRs Calmodulin Human UTRs annotated

  24. What annotation is available? • Gene/transcript/peptide models (coding and noncoding (ncRNAs)) • IDs in other database • Mapped cDNAs, peptides, micro array probes, BAC clones etc. • Cytogenetic bands, markers, repeats etc. • Comparative data: • orthologues and paralogues, protein families, whole genome alignments, syntenic regions • Variation data: • Single Nucleotide Polymorphisms (SNPs) • Regulatory data: • “best guess” set of regulatory elements from ENCODE • Data from external sources (DAS)

  25. Specific data sources • Microarrays (Affimetrix, Illumina, Agilent) • GO (Gene Ontology: functional classes) http://www.geneontology.org/ • OMIM (human diseases and phenotypes) http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM • Identifiers in Entrez, UniProt, Refseq, etc • PDB, MSD (structural databases) http://www.rcsb.org/pdb/ http://www.ebi.ac.uk/msd/

  26. Interpro Collection of protein data Sequences, Motifs, Structures http://www.ebi.ac.uk/interpro/

  27. How is this information organised? • Ensembl Views (Website) • Ensembl Database (open source) (Perl API, FTP site) • BioMart ‘DataMining tool’

  28. Ensembl – Open Source Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. http://ensembl.genome.tugraz.at/index.html http://ensembl.genomics.org.cn/ or user projects with own data 28 of 42

  29. Powered by Ensembl 29 of 42

  30. Help and Information • Use our helpdesk! helpdesk@ensembl.org • View our help pages! (the ‘using Ensembl’ link) • View our animated tutorials http://www.ensembl.org/common/Workshops_Online • Mailing lists: ensembl-announce@ebi.ac.uk • Come visit our blog! http://ensembl.blogspot.com/

  31. Ensembl Team

More Related