1 / 24

GUS: A Functional Genomics Data Management System

GUS: A Functional Genomics Data Management System. Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon.

dotty
Download Presentation

GUS: A Functional Genomics Data Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon

  2. Database Options for Integrated Functional Genomics • Requirements • Covers genomics and functional genomics • Active and open developer community • Options • GUS: Genomics Unified Schema • Chado: generic model organism database (GMOD http://www.gmod.org)

  3. Java Servlets Oracle RDBMS Object Layer for Data Loading DoTS RAD TESS SRES Core A Few GUS Web Sites U. Penn Sanger Institute U. Georgia U. Toronto U. Chicago Flora Centromere Database Phytophthora sojae genome GUS Virginia Bioinformiatics Insitiute

  4. Namespace Domain Features DoTS Sequence and annotation EST clusters Gene models RAD Gene Expression MIAME/MAGE-OM TESS Gene Regulation TFBS organization Sres Shared Resources Ontologies Core Data Provenance Documentation GUS (Genomics Unified Schema) http://www.gusdb.org

  5. Identify shared TF binding sites Genomic alignment and comparative sequence analysis SRES BioMaterial annotation RAD EST clustering and assembly DoTS TESS

  6. Examples of GUS users • Large sequencing center • GeneDB: Pathogen Sequencing Unit at the Sanger Institute • Lightly staffed genomics project • CryptoDB: Kissinger Lab, University of Georgia • Data mining project • Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators • Expression based project • dbDirt: Allen Okey, University of Toronto • Bioinformatics Core Facility • University of Pennsylvania Bioinformatics Core Facility

  7. GUS Project Goals • Provide: • A platform for broad genomics data integration • An infrastructure system for functional genomics • Support: • Websites with advanced query capabilities • Research driven queries and mining

  8. GUS components Your data GenBank NRDB dbEST SNPs Genetraps MicroArrays Phenotypes Pathways Orthologs Taxonomy GO SO EC More… Pipeline API Plugins (data loaders) Web Development Kit Data Load API Perl Object Layer Queries And analysis Warehouse (Oracle or PostgreSQL)

  9. Proteomics ImmunoHistChem MIAME MISFISHIE MIAPE Study Study Sample Sample In Situ Hybridization Image Analysis Image Analysis Statistical Processing Statistical Processing Interaction www.mged.org psidev.sf.net www.scgap.org Functional genomics with GUS Expression (RAD) Sequence & Features Study Sample Central Dogma Image Analysis Statistical Processing Regulation (TESS) Functional Annotation of the Genome

  10. GUS versus chado • GUS represents biology in the database tables • Forces applications to load and retrieve data consistently • Chado represents biology in the applications • Allows flexibility in what can be stored but applications may not be consistent

  11. Central dogma and sequences GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence

  12. Central dogma and sequences Gene RNA Protein GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence

  13. Central dogma and sequences Gene RNA Protein RNA Multiple sequences (experimental variety) Gene 1 Gene 2 Multiple genes genome NA Sequence AA Sequence

  14. Central dogma and sequences Gene RNA Protein GeneInstance RNAInstance ProteinInstance GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence

  15. Obtaining and Using GUS • www.gusdb.org • More info at www.gusdb.org/documentation • Active gusdev mailing list • Relatively straightforward to install • Loading data a struggle for new users • Growing number of tools available • Addressing how to use and write tools with visits • Web Development Kit (WDK) to generate web sites on GUS

  16. Current GUS Developers At Penn • Steve Fischer: Project manager, WDK, • Elisabetta Manduchi: RAD project manager, RAD study annotator • Angel Pizarro: Schema development, proteomics, MAGE export • Mike Saffitz: DBA, web services, Postgres • Dave Barkan: WDK, GO pipeline, Apollo interface • Thomas Gan: WDK, genomic alignments pipeline • John Iodice: ApiDoTS pipeline, data loading • Li Li: OrthoMCL pipeline • Junmin Liu: RAD websites, expression displays • Debbie Pinney: Data loaders, Hum and MusDoTS pipeline • Jonathan Schug: TESS, architecture and schema development • Trish Whetzel: Data loading, RAD, schema development • Plus rest of group contributes through various GUS-based projects Pathogen Sequencing Unit, Sanger Institute Kissinger Group, U. of Georgia Terry Clark, U. of Chicago

  17. WDKTestSite Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)

  18. The PlasmoDB Team Shailesh Date Kobby Essien Martin Fraunholz Bindu Gajria Greg Grant John Iodice Jessie Kissinger Philip Labo Li Li Jules Milgram David Roos Chris Stoeckert Trish Whetzel NIAID grant: R01 AI058515

  19. GUS supports a wide variety of queries

  20. Suppose you want to find all kinases in P. falciparum

  21. Gene Report Pages Integrate Genomics and Functional Genomics

  22. RAD Study-Annotator • Covers the MIAME checklist and exploits the MGED Ontology • Allows entering of very specific details of an experiment • Web-based forms: • Modular structure • Written in PHP • Front-end data integrity checks using JavaScript • Manages Data Privacy based on Project/Group selections present in GUS schema • Manduchi et al. 2004 Bioinformatics 20:452-459.

  23. Vision for GUS • Installable for every lab • Improve install scripts, documentation • Postgres version • Extendable to all areas of functional genomics • Sequence, array-based expression experiments • Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids • In situ hybridizations, metabolites • Interoperable with other GUS installations and with common tools • Exchange files and scripts, MAGE-ML (use community standards) • Web services (exchange objects) • Interface with open source tools such as Gbrowse, Artemis, Apollo

  24. Standards and Ontologies for Functional Genomics 2October 23-26, 2004held at the University of Pennsylvania Medical Schoolwww.jax.org/courses/events Co-Hosted by The Jackson Laboratory University of Pennsylvania European Bioinformatics Institute ------------------------ Student Scholarships Available -------------------------------------------------------- Funded in part by NHGRI NCRR NERC GSK Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC

More Related