1 / 74

Dave Clements GMOD Help Desk US National Evolutionary Synthesis Center (NESCent)

Database Tools for Biologists. Pre-conference workshop. Dave Clements GMOD Help Desk US National Evolutionary Synthesis Center (NESCent) clements@nescent.org. Sponsored by. Bioinformatics Australia 2009 28 October 2009, 2-6 pm. This talk is available in two formats. PowerPoint

michi
Download Presentation

Dave Clements GMOD Help Desk US National Evolutionary Synthesis Center (NESCent)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Tools for Biologists Pre-conference workshop Dave Clements GMOD Help Desk US National Evolutionary Synthesis Center (NESCent) clements@nescent.org Sponsored by Bioinformatics Australia 2009 28 October 2009, 2-6 pm

  2. This talk is available in two formats • PowerPoint • This format has notes and it works particularly well for: • the "Software" slide that lists all components covered in this talk • the "Chado Modules" slide Both of these have animation, and notes that go with it. ftp://ftp.gmod.org/pub/gmod/Meetings/2009/BA/BA2009GMODWorkshop.ppt • PDF ftp://ftp.gmod.org/pub/gmod/Meetings/2009/BA/BA2009GMODWorkshop.pdf

  3. Agenda

  4. GMOD is … • A set of interoperable open-source software components for visualizing, annotating, and managing biological data. • An active community of developers and users asking diverse questions, and facing common challenges, with their biological data.

  5. Workshop Page http://gmod.org/wiki/BA2009

  6. Agenda

  7. Software GMOD components can be categorised as Visualization V Data Management D Annotation A

  8. Software Annotation MAKER A A DIYA A Galaxy A Ergatis A GBrowse V Visualization JBrowse V V CMap V Chado D Tripal A V GBrowse_syn GMODWeb V V Sybil V Textpresso SynView A V Apollo A Table Edit V A BioMart D InterMine Data Management D D GMOD Has You have Sequence Gene models Mapping data Alternative transcripts Expression SNP / variation Methylation GO terms Stocks / lines Publications / Attribution Orthology

  9. GBrowse GMOD's leading genome browser Landing page for E. coli example Overview: chromosome / contig wide Region: intermediate zoom Details: current area Tracks: current configuration The generic genome browser: a building block for a model organism system database. Stein LD et al. (2002) Genome Res 12: 1599-610

  10. GBrowse Example: modENCODE • Uses GBrowse 2 http://www.modencode.org/gb2/gbrowse/fly/

  11. GBrowse Tutorials • GBrowse User Tutorial at OpenHelix • Flash based, has handouts, very snazzy and thorough • Great resource for your users • GBrowse Admin Tutorial • HTML based, written by Lincoln Stein, mostly • Excellent way to learn how to configure GBrowse • GBrowse Admin Tutorial w/ VMware Image • From the 2009 GMOD Summer Schools • Gives you a system to start with • NGS in GBrowse and SAMtools tutorial w/ VMware • New, today http://gmod.org/wiki/GBrowse_Tutorial

  12. SAMtools Platform neutral set of programs and file formats specifically for short reads. Heng Li, et al., http://samtools.sf.net

  13. SAM and BAM • SAM file format • Platform neutral mapping / alignment data • Text, human readable • Many mapping algorithms now produce SAM output • BAM file format • Binary, compressed, indexed, and streamable version of SAM. • SAMtools provides scripts to manipulate these

  14. Visualising NGS Data in GBrowse w/ SAMtools • Worked example • Start with a system that has prerequisite software already installed. • Including default GBrowse 2 install • Using VMware • Sometime in the next 4 weeks, I'll post • VMware image we started with today • VMware image we ended with today • Detailed instructions on how we got there. http://gmod.org/wiki/GBrowse_NGS_Tutorial

  15. GBrowse Future Plans • Circular genome support • 2.0, Release in 2010 • Database and rendering multiplexing • Asynchronous track loading • GBrowse in the cloud • User authentication • 1.x has a few more maintenance releases left.

  16. GBrowse Resources

  17. JBrowse • GMOD's 2nd generation genome browser • It's fast • Completely new • Client side rendering • Heavily AJAX • JSON, Nested Containment Lists JBrowse: A next-generation genome browser, Mitchell E. Skinner, Andrew V. Uzilov, Lincoln D. Stein, Christopher J. Mungall and Ian H. Holmes, Genome Res. 2009. 19: 1630-1638

  18. JBrowse Demo http://jbrowse.org

  19. JBrowse Future Plans • Tools for migrating from GBrowse • An ecosystem comparable to GBrowse • Glyph library, user defined glyphs, callbacks, track sharing, … • Comparative genomics (more on that later) • Community Annotation • User authentication • User uploadable and sharable tracks and annotation

  20. JBrowse Resources

  21. GBrowse or JBrowse GBrowse Robust ecosystem Feature rich Large and growing user base Track sharing JBrowse Very fast Rapidly growing user base Lots of future development Easy to configure

  22. GBrowse_syn • GBrowse based comparative genomics viewer • Shows a reference sequence compared to 2 or more others • Can also show any GBrowse-based annotations Examle comparing C. elegans to 4 other species at Wormbase Sheldon McKay, Cold Spring Harbor Laboratory

  23. GBrowse_syn Future Work • Integration with GBrowse 2 • High-level graphical overview • AJAX based user interface and navigation. • Submitting grant next week proposing implementing a JBrowse based synteny browser

  24. GBrowse_syn Resources

  25. SynView and Sybil Sybil SynView Whole Genome Gradient Display Cluster Report SynView: a GBrowse-compatible approach to visualizing comparative genome data. Haiming Wang, et al.; in Bioinformatics 22 (18) Sybil: Methods and Software for Multiple Genome Comparison and Visualization. Crabtree, et al.; in Gene Function Analysis, ed. by Michael F. Ochs (2007)

  26. GBrowse_syn or Sybil or SynView? GBrowse_syn Most actively developed Scalable Familiar interface Extensive documentation Growing user community SynView Scalable Runs inside GBrowse Sybil Scalable Whole genome and other unique visualizations Built on Chado

  27. CMap Web based comparative map viewer CMap is data type agnostic: Can link sequence, genetic, physical, QTL, deletion, optical, … Particularly popular in plant community CMap 1.01: A comparative mapping application for the Internet, Ken Youens-Clark, Ben Faga, Immanuel V. Yap, Lincoln Stein and Doreen Ware, Bioinformatics, doi:10.1093/bioinformatics/btp458

  28. CMap Future Work • Streamline the database • Faster access • Display in SVG • Save in Circos / MizBee format

  29. CMap Resources

  30. Coffee / Tea Break! http://www.cafepress.org/GenericMod

  31. Agenda

  32. Chado: A database schema for biological data • A schema is a database design • Blueprint for a database, a way of organizing data • Independent of specific data • Chado provides structure • You provide the hard work and data + =

  33. Why use Chado? • Very good at genomic data • Widely used • AphidBase, BeetleBase, dictyBase, FlyBase, SGN, SpBase, VectorBase, wFleaBase, … • Integrates with other GMOD tools • Community of support • Modular, flexible and extensible

  34. Chado Modules general organism pub sequence mage genetic cv companalysis

  35. CVs and Ontologies in Chado • Controlled vocabularies and ontologies are key in Chado • Maximally used for • Integrity • Interoperability • Can create your own, but … • Please use standard ontologies when they exist • See OBO: http://www.obofoundry.org/ CV

  36. Chado Future Developments Flexibility means core schema changes slowly That's a feature. • Natural Diversity module • Better support for phenotypes, crosses, individuals, geolocation, … • Based on GDPDM from Cornell University, Terry Casstevens, et al. (http://www.maizegenetics.net/gdpdm/) • Expression / Anatomy / Cell Fate Atlas support • Aniseed (http://aniseed-ibdm.univ-mrs.fr/) converting to Chado and extending it to better support atlases • Will have a web front end for atlases

  37. Chado Resources

  38. Chado Web Front Ends • Chado is a schema, a server side technology • It is not a web front end or a desktop client • Options for Chado web front ends: • Do it yourself • GMODWeb • Tripal

  39. GMODWeb • A Chado specific set of templates for the generic Turnkey web site generation system • Written in Perl • Lots of Perl module dependencies ParameciumDB, a website built with GMODWeb http://paramecium.cgm.cnrs-gif.fr/ GMODWeb: a web framework for the generic model organism database, O'Connor et al., Genome Biology 2008, 9:R102.

  40. Tripal • Added to GMOD this year • Set of Drupal modules • Feature, Organism, Library, Analysis • Modules roughly correspond to Chado modules • Easy to create new modules • Includes user authentication, job management, and data entry support • Developed by Clemson University Genomics Institute MarineGenomics.org Stephen Ficklin, Meg Staton, Chun-Huai Cheng, … Clemson University Genomics Institute

  41. Tripal Resources

  42. Chado Web: DIY or GMODWeb or Tripal? GMODWeb Complete Requires some tuning Perl Tripal User authentication Data entry Actively developed Well documented Easy to extend Drupal Do It Yourself More work Get exactly what you want What really made us decide to switch over to Drupal was that we needed authentication mechanisms, customized data entry mechanisms, and the ability to add social networking features and other non-biological components to our sites. Drupal supported all of this and was widely used, well documented, and well supported. Stephen Ficklin, CUGI

  43. BioMart and InterMine • Chado well-suited for setting up organism databases that have • Easy to use query interface to support common types of questions • Unified, coherent presentation of information • BioMart and InterMine • Allow users to ask complex queries on all data • At the expense of having to do more work

  44. BioMart • Query oriented data integration system • Uses distributed data warehousing ideas. • Complex query builder • Gives users access to every field • Any BioMart can be configured to access other BioMart instances • Both a database and a web front-end BioMart – biological queries made easy, Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson and Arek Kasprzyk, BMC Genomics 2009, 10:22

  45. BioMart

  46. BioMart Resources

  47. InterMine • Similar idea to BioMart, different approach • Uses Sequence Ontology as central organising principle • Supports • Complex queries and access to all fields • Predefined queries that can be tuned • Predefined datasets (e.g., "Most enriched genes in adult fly brain") • Developing federation capabilities FlyMine: an integrated database for Drosophila and Anopheles genomics, Rachel Lyne, et al., Genome Biology 2007, 8:R129

  48. InterMine Resources

  49. GFF3 • The common file format of GMOD for genomic annotation • Supported by Chado, GBrowse, JBrowse, CMap, Apollo, ….

  50. Agenda

More Related