120 likes | 338 Views
The Functional Genomics Experiment Model (FuGE). Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester. History. Data sharing for ‘omics data tackled by various groups: MAGE format for microarrays (MGED 2002) PEDRo for proteomics (U. Man 2003)
E N D
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester
History • Data sharing for ‘omics data tackled by various groups: • MAGE format for microarrays (MGED 2002) • PEDRo for proteomics (U. Man 2003) • Problems for functional genomics: • Common parts modelled differently • Labs performing both techniques must create 2 complex applications to describe similar concepts • Difficult to integrate data • Two efforts to merge MAGE and PEDRo (2004) • Merged models even more complex • Did not cover other techniques e.g. metabolomics • But, significant advantages if upstream details can be described only once!
Introduction to FuGE • Functional Genomics Experiment model (FuGE) • Models common components across functional genomics experiments • Sample description, experimental variables protocols, multidimensional data Three uses of FuGE: • A data format for representing laboratory workflows • Supplement existing data formats with additional metadata to describe their context within a workflow • A framework for building new data formats
FuGE structure Audit Description • Common: • General data format management • Auditing • Referencing external resources • Protocols Measurement FuGE exists as: 1. Object model (UML) UML XML Schema 2. XML schema ...and Java STK, Hibernate relational DB binding etc. Ontology Protocol Common Reference FuGE • Bio: • Investigation structure • Data • Materials (organisms, solutions, compounds) • Theoretical molecules e.g. sequences, metabolites stored in a database Data Bio Investigation Material Conceptual Molecule
= Inputs and outputs = ProtocolApplication Use 1: Experiment Workflow Material Material Treatment Treatment Material Material Data Treatment Data Acquisition Data Transformation Material Data Data
Use 2: Tie Together External Formats ProtocolApplication inputMaterial outputData Material ExternalData mzData file Material can be used to describe the sample. This connects the MS data with a separation workflow File format definition Parser will exist to extract data / parameters from mzData file
FuGE Status • Milestone 1 (Sept 2005) • Milestone 2 (Dec 2005) • Milestone 3 (May 2006) • Beta Java software toolkit • M2 (March 2006); M3 (Sept 2006) • FuGE v1 (candidate) • Currently in PSI standards process • Expected to stablise from process by March/April 07
Formats extending from FuGE • MAGE version 2 (MGED) • GelML and GelInfoML (PSI) • analysisXML (PSI) • spML (PSI / MSI) • NMR (FuGE being evaluated by MSI) • Planned migration for mzData and other PSI formats • Upstream workflow description for all groups • investigation structure and variables, sample description etc. • Allows assembly of studies that cross-technology boundaries in one data format
Conclusions • FuGE accepted by MGED, PSI and MSI • for developing future data formats • for describing parts of experiments common across technology • Moving toward convergence of data formats • Simplify process of developing new data standards • Will facilitate data integration and submission of data to public repositories • Improve the uniformity of data sets in public repositories thus facilitates querying Web: http://fuge.sourceforge.net/
FuGE development Angel Pizarro (UPenn), Michael Miller (Rosetta), Paul Spellman (Lawrence Berkley) MGED, PSI, Fred Hutchinson CRC, Genologics PSI Chris Taylor, Henning Hermjakob, Randy Julian MSI Nigel Hardy and Helen Jenkins (Aber) Work on FuGE in Manchester is funded by the BBSRC Acknowledgements Email: ajones@cs.man.ac.uk Web: http://fuge.sourceforge.net/