The Functional Genomics Experiment Object Model (FuGE)

The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society

What is FuGE? • Various groups have tried to fuse MAGE and PEDRo in the past • Such a model would be difficult to manage • FuGE is a model of the common components of functional genomics experiments • Aims to help the development of data standards • Should allow some cross-compatibility between different ‘omics experiments • Microarray & proteome standards will use parts of FuGE for some data formats

So, what is FuGE? • An object model in UML (close to 1st stable release) • An XML Schema (in development) • A software API (will be created from UML) • FuGE use ontologies extensively, such as MGED Ontology or its successor (FuGO) Developed by members of MGED / PSI with input from cross-omics experimentalists e.g. RSBI

What is FuGE not…? • Not an effort to create one data standard for all lab techniques • This problem is hard at technical level and v hard getting agreement from all groups • Not a model for metabolomics metadata • But it might help in the development of one • …and we would like to encourage input from the metabolomics community

FuGE Structure • 2 sections: Common and Bio • Common – components that aid the development of a rich data standard • Protocols, external references, auditing and security settings • Bio – biological specific components • Biological (or chemical) materials, bio sequences • Summary of an investigation structure • References to data model specific to each domain

Protocols • Protocols have a set of ordered atomic actions • Actions are user-entered text or ontology terms • Protocols can be associated with Software and Equipment • Protocols, Software and Equipment can have a set of defined Parameters • Mechanism for defining a standard protocol, and an instance of a protocol (date, operator…) • Nested protocols can be defined for representing complex procedures • An Action can be a reference to another Protocol

= Inputs and outputs of Protocols = Instance of some Protocol FuGE Workflow Material Material Treatment Treatment Material Material Treatment Data Acquisition Data Transformation Material Data Data

FuGE Workflow • Materials defined using terms from ontologies • Treatments defined by Protocols • Data represented in domain specific format • FuGE is the “glue” for sticking components together Material Material Treatment Treatment Material Material Treatment Data Acquisition Data Transformation Material Data Data

Other useful components • Each object can be tagged with audit info: • Who made a change, when, what type of change • Security information: • users, groups for accessing/changing data • Consistent mechanism for identifying objects • Life sciences IDs (LSIDs) used to uniquely ID components • Objects can be referenced across documents • Mechanism for linking to external databases, literature refs and ontologies

Investigation model • Stores a summary of the investigation to facilitate queries • Purpose of investigation (hypothesis) • Design of the investigation • e.g. strain differences, gene knockout, drug doses, time course • Stores the important variables • Values from ontology e.g. gene names, units etc… • Links from variables to relevant data items

Benefits of shared components • Queries over common annotation • Samples, hypotheses, protocols • Shared software for experimental annotation and analysis • Microarrays, proteomics and metabolomics (and other experiments!) performed in same lab • Developing standards for each technique is a hard problem • Shared resources could alleviate the problems (audit, security, identifying objects, ontologies)

Using FuGE in Practice • Imports parts of UML or XML Schema and extend with domain-specific components • Example: Attempting to integrate FuGE with our Manchester metabolomics database • Reference a FuGE entry for investigation structure and bio samples • Define ontologies and use FuGE as it is for experimental metadata • This would not include a format for mass spec or NMR data, which would also be needed

Conclusions • FuGE was created to solve the general problem: • What are the common requirements for a “functional genomics” data standard? • MGED will use FuGE for generating MAGE version 2 • PSI evaluating FuGE for protein separation standard format • FuGE-based systems being implemented by a number of organisations • FuGE could help develop a metabolome format http://fuge.sourceforge.net

Acknowledgements • FuGE has been developed in collaboration with many groups, including: • Angel Pizarro (U Penn) • Paul Spellman (Lawrence Berkley) • Michael Miller (Rosetta) • Members of Fred Hutchinson CRC, Seattle • RSBI • Various other members of MGED and PSI http://fuge.sourceforge.net

DescribableIdentifiable

Common.Description • Many classes inherit from Describable • Link to Audit / Security details • URI and text description

Protocol

Audit

Investigation

Material

Common.Data • Ordered set of Dimensions • Data stored in Matrix • Matrix must be extended with subclasses

The Functional Genomics Experiment Object Model (FuGE)

The Functional Genomics Experiment Object Model (FuGE)

Presentation Transcript

Functional Genomics – Why?

Functional Genomics in Non-Model Organisms

Microbial Functional Genomics

The Object Model

FUNCTIONAL GENOMICS 2

The Object Model

Computational functional genomics

Microbial Functional Genomics

Functional Genomics

Functional Genomics

The Functional Genomics Experiment Model (FuGE)

The Model Experiment

Functional Genomics: The Knockout Approach

Functional Genomics

Functional Genomics

Functional genomics

Microbial Functional Genomics

Functional Genomics

The Object Model