210 likes | 350 Views
The Functional Genomics Experiment Object Model (FuGE). Andrew Jones, School of Computer Science, University of Manchester. MGED Society. What is FuGE?. Various groups have tried to fuse MAGE and PEDRo in the past Such a model would be difficult to manage
E N D
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society
What is FuGE? • Various groups have tried to fuse MAGE and PEDRo in the past • Such a model would be difficult to manage • FuGE is a model of the common components of functional genomics experiments • Aims to help the development of data standards • Should allow some cross-compatibility between different ‘omics experiments • Microarray & proteome standards will use parts of FuGE for some data formats
So, what is FuGE? • An object model in UML (close to 1st stable release) • An XML Schema (in development) • A software API (will be created from UML) • FuGE use ontologies extensively, such as MGED Ontology or its successor (FuGO) Developed by members of MGED / PSI with input from cross-omics experimentalists e.g. RSBI
What is FuGE not…? • Not an effort to create one data standard for all lab techniques • This problem is hard at technical level and v hard getting agreement from all groups • Not a model for metabolomics metadata • But it might help in the development of one • …and we would like to encourage input from the metabolomics community
FuGE Structure • 2 sections: Common and Bio • Common – components that aid the development of a rich data standard • Protocols, external references, auditing and security settings • Bio – biological specific components • Biological (or chemical) materials, bio sequences • Summary of an investigation structure • References to data model specific to each domain
Protocols • Protocols have a set of ordered atomic actions • Actions are user-entered text or ontology terms • Protocols can be associated with Software and Equipment • Protocols, Software and Equipment can have a set of defined Parameters • Mechanism for defining a standard protocol, and an instance of a protocol (date, operator…) • Nested protocols can be defined for representing complex procedures • An Action can be a reference to another Protocol
= Inputs and outputs of Protocols = Instance of some Protocol FuGE Workflow Material Material Treatment Treatment Material Material Treatment Data Acquisition Data Transformation Material Data Data
FuGE Workflow • Materials defined using terms from ontologies • Treatments defined by Protocols • Data represented in domain specific format • FuGE is the “glue” for sticking components together Material Material Treatment Treatment Material Material Treatment Data Acquisition Data Transformation Material Data Data
Other useful components • Each object can be tagged with audit info: • Who made a change, when, what type of change • Security information: • users, groups for accessing/changing data • Consistent mechanism for identifying objects • Life sciences IDs (LSIDs) used to uniquely ID components • Objects can be referenced across documents • Mechanism for linking to external databases, literature refs and ontologies
Investigation model • Stores a summary of the investigation to facilitate queries • Purpose of investigation (hypothesis) • Design of the investigation • e.g. strain differences, gene knockout, drug doses, time course • Stores the important variables • Values from ontology e.g. gene names, units etc… • Links from variables to relevant data items
Benefits of shared components • Queries over common annotation • Samples, hypotheses, protocols • Shared software for experimental annotation and analysis • Microarrays, proteomics and metabolomics (and other experiments!) performed in same lab • Developing standards for each technique is a hard problem • Shared resources could alleviate the problems (audit, security, identifying objects, ontologies)
Using FuGE in Practice • Imports parts of UML or XML Schema and extend with domain-specific components • Example: Attempting to integrate FuGE with our Manchester metabolomics database • Reference a FuGE entry for investigation structure and bio samples • Define ontologies and use FuGE as it is for experimental metadata • This would not include a format for mass spec or NMR data, which would also be needed
Conclusions • FuGE was created to solve the general problem: • What are the common requirements for a “functional genomics” data standard? • MGED will use FuGE for generating MAGE version 2 • PSI evaluating FuGE for protein separation standard format • FuGE-based systems being implemented by a number of organisations • FuGE could help develop a metabolome format http://fuge.sourceforge.net
Acknowledgements • FuGE has been developed in collaboration with many groups, including: • Angel Pizarro (U Penn) • Paul Spellman (Lawrence Berkley) • Michael Miller (Rosetta) • Members of Fred Hutchinson CRC, Seattle • RSBI • Various other members of MGED and PSI http://fuge.sourceforge.net
Common.Description • Many classes inherit from Describable • Link to Audit / Security details • URI and text description
Common.Data • Ordered set of Dimensions • Data stored in Matrix • Matrix must be extended with subclasses