170 likes | 394 Views
ArrayExpress. www.ebi.ac.uk/arrayexpress. Ugis Sarkans EMBL - EBI. Outline. why the domain model is not simple ArrayExpress object model ArrayExpress implementation status future developments. Underlying principles.
E N D
ArrayExpress www.ebi.ac.uk/arrayexpress Ugis Sarkans EMBL - EBI
Outline • why the domain model is not simple • ArrayExpress object model • ArrayExpress implementation status • future developments
Underlying principles • must be able to accommodate needs of a technology that is under constant development • must be able to manage data in absence of standard measurement units and standards for reliability information • gene expression data have any meaning only in the context of what are the experimental conditions • controlled vocabularies and ontologies needed for unambiguous sample annotation • MIAME-compliant
Motivation for 2 object models • many spots - one gene • raw data - cleaned-up data - ratios - normalizations - higher-level analysis • how detailed sample description is needed? • for data mining we need ways to unify several datasets: • array features across different array platforms • samples from different experiments • various raw and derived measurements
Scope of ArrayExpress object models • useable for a public repository as well as a laboratory database (e.g., as a part of LIMS) • implementation of “intermediate” models possible • mapping to RDBMS tables - not necessarily straightforward • models and documentation available atwww.ebi.ac.uk/arrayexpress
ArrayExpress - features • able to import MAML format • can deal with both raw and processed data • independence of: • experimental platforms • image analysis methods • data normalization methods • object model-based query mechanism • will support upcoming OMG standard for expression data
Key constructs in the AE object model • structured sample descriptions • notion of ExpressionValueSet • several dimensions for ExpressionValues • Transformations working on ExpressionValueSets and their dimensions
treatment Derived sample 1 Primary sample 1 Sample source extraction Derived sample 2 treatment Primary sample 2 Extract 1 A new state of sample source Extract 2 labeling Hybridization Labeled extract 2 Labeled extract 1 Structured representation of sample and treatment relations
Microarray expression valuerepresentation expression value types composite spots primary measurements derived values primary spots composite images e.g., green/red ratios primary images
Current status • object model - stable, supports current MIAME • physical database schema • MAML data loader • populated with one dataset from EMBL • currently accessible through SQL
In development • data loader - changes following MAML evolution • annotation & MAML export tool • Web interface to ArrayExpress • programmatic interface will follow
Proposed architecture application server Web server MAML data ArrayExpress data warehouse data submission & curation database image server? curation pipeline
Future developments • will support upcoming OMG standard for gene expression data (XML, queries) • diagrammatic interface to sample description submodel • integration with other databases • analytical tools running on top of ArrayExpress • data curation pipeline development
Acknowledgements • MGED - MIAME, MAML • Incyte - Genomic Knowledge Platform • OMG gene expression data proposal submitters - Rosetta & NetGenics