240 likes | 490 Views
GUS: A Functional Genomics Data Management System. Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon.
E N D
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Research October 8, 2004 Portland, Oregon
Database Options for Integrated Functional Genomics • Requirements • Covers genomics and functional genomics • Active and open developer community • Options • GUS: Genomics Unified Schema • Chado: generic model organism database (GMOD http://www.gmod.org)
Java Servlets Oracle RDBMS Object Layer for Data Loading DoTS RAD TESS SRES Core A Few GUS Web Sites U. Penn Sanger Institute U. Georgia U. Toronto U. Chicago Flora Centromere Database Phytophthora sojae genome GUS Virginia Bioinformiatics Insitiute
Namespace Domain Features DoTS Sequence and annotation EST clusters Gene models RAD Gene Expression MIAME/MAGE-OM TESS Gene Regulation TFBS organization Sres Shared Resources Ontologies Core Data Provenance Documentation GUS (Genomics Unified Schema) http://www.gusdb.org
Identify shared TF binding sites Genomic alignment and comparative sequence analysis SRES BioMaterial annotation RAD EST clustering and assembly DoTS TESS
Examples of GUS users • Large sequencing center • GeneDB: Pathogen Sequencing Unit at the Sanger Institute • Lightly staffed genomics project • CryptoDB: Kissinger Lab, University of Georgia • Data mining project • Multiple plant species: Brett Tyler, Virginia Bioinformatics Institute and collaborators • Expression based project • dbDirt: Allen Okey, University of Toronto • Bioinformatics Core Facility • University of Pennsylvania Bioinformatics Core Facility
GUS Project Goals • Provide: • A platform for broad genomics data integration • An infrastructure system for functional genomics • Support: • Websites with advanced query capabilities • Research driven queries and mining
GUS components Your data GenBank NRDB dbEST SNPs Genetraps MicroArrays Phenotypes Pathways Orthologs Taxonomy GO SO EC More… Pipeline API Plugins (data loaders) Web Development Kit Data Load API Perl Object Layer Queries And analysis Warehouse (Oracle or PostgreSQL)
Proteomics ImmunoHistChem MIAME MISFISHIE MIAPE Study Study Sample Sample In Situ Hybridization Image Analysis Image Analysis Statistical Processing Statistical Processing Interaction www.mged.org psidev.sf.net www.scgap.org Functional genomics with GUS Expression (RAD) Sequence & Features Study Sample Central Dogma Image Analysis Statistical Processing Regulation (TESS) Functional Annotation of the Genome
GUS versus chado • GUS represents biology in the database tables • Forces applications to load and retrieve data consistently • Chado represents biology in the applications • Allows flexibility in what can be stored but applications may not be consistent
Central dogma and sequences GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence
Central dogma and sequences Gene RNA Protein GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence
Central dogma and sequences Gene RNA Protein RNA Multiple sequences (experimental variety) Gene 1 Gene 2 Multiple genes genome NA Sequence AA Sequence
Central dogma and sequences Gene RNA Protein GeneInstance RNAInstance ProteinInstance GeneFeature RNAFeature ProteinFeature NA Sequence AA Sequence
Obtaining and Using GUS • www.gusdb.org • More info at www.gusdb.org/documentation • Active gusdev mailing list • Relatively straightforward to install • Loading data a struggle for new users • Growing number of tools available • Addressing how to use and write tools with visits • Web Development Kit (WDK) to generate web sites on GUS
Current GUS Developers At Penn • Steve Fischer: Project manager, WDK, • Elisabetta Manduchi: RAD project manager, RAD study annotator • Angel Pizarro: Schema development, proteomics, MAGE export • Mike Saffitz: DBA, web services, Postgres • Dave Barkan: WDK, GO pipeline, Apollo interface • Thomas Gan: WDK, genomic alignments pipeline • John Iodice: ApiDoTS pipeline, data loading • Li Li: OrthoMCL pipeline • Junmin Liu: RAD websites, expression displays • Debbie Pinney: Data loaders, Hum and MusDoTS pipeline • Jonathan Schug: TESS, architecture and schema development • Trish Whetzel: Data loading, RAD, schema development • Plus rest of group contributes through various GUS-based projects Pathogen Sequencing Unit, Sanger Institute Kissinger Group, U. of Georgia Terry Clark, U. of Chicago
WDKTestSite Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)
The PlasmoDB Team Shailesh Date Kobby Essien Martin Fraunholz Bindu Gajria Greg Grant John Iodice Jessie Kissinger Philip Labo Li Li Jules Milgram David Roos Chris Stoeckert Trish Whetzel NIAID grant: R01 AI058515
Gene Report Pages Integrate Genomics and Functional Genomics
RAD Study-Annotator • Covers the MIAME checklist and exploits the MGED Ontology • Allows entering of very specific details of an experiment • Web-based forms: • Modular structure • Written in PHP • Front-end data integrity checks using JavaScript • Manages Data Privacy based on Project/Group selections present in GUS schema • Manduchi et al. 2004 Bioinformatics 20:452-459.
Vision for GUS • Installable for every lab • Improve install scripts, documentation • Postgres version • Extendable to all areas of functional genomics • Sequence, array-based expression experiments • Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast 2-hybrids • In situ hybridizations, metabolites • Interoperable with other GUS installations and with common tools • Exchange files and scripts, MAGE-ML (use community standards) • Web services (exchange objects) • Interface with open source tools such as Gbrowse, Artemis, Apollo
Standards and Ontologies for Functional Genomics 2October 23-26, 2004held at the University of Pennsylvania Medical Schoolwww.jax.org/courses/events Co-Hosted by The Jackson Laboratory University of Pennsylvania European Bioinformatics Institute ------------------------ Student Scholarships Available -------------------------------------------------------- Funded in part by NHGRI NCRR NERC GSK Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC