200 likes | 356 Views
SysMO-DB and ISA. Katy Wolstencroft, University of Manchester, UK. Data Exchange in SysMO. Microarray. Metadata. Metabolomics. Proteomics. Proteomics. Single Cell Data. Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO
E N D
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK
Data Exchange in SysMO Microarray Metadata Metabolomics Proteomics Proteomics Single Cell Data • Public data sources • model organism databases – (e.g. SGD) • BRENDA …. • Data produced by SysMO • SABIO-RK, iChiP, MeMo …. • Local databases & Files • Excel Spreadsheets • The most common form of experimental data format. Variable descriptions of data Little adoption of community controlled vocabulary terms
Challenges..… Enable data to be easily exchanged & integrated Preserving project autonomy Working with existing resources Wikis; CMS - Alfresco, eGroupWare,MediaWiki; Databases- BASE, maxD; Files and Spreadsheets. Falling in with common work practices Exploiting existing resources in the community
Extracting Data JERM SysMOLab Wiki COSMIC Alfresco MOSES Wiki BaCell-SysMO Alfresco ANOTHER A DATA STORE
JERM JERM • JERM “Just Enough Results Model” • Minimum information to exchange data • What type of data is it • Microarray, growth curve, enzyme activity… • What was measured • Gene expression, OD, metabolite concentration…. • What do the values in the datasets mean • Units, time series, repeats…. • Which experiment does it relate to • How was the data created • SOPs and protocols
The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data 1 ISA-TAB Define a JERM….. • Top down analysis of standards • Bottom up analysis of practice 2 Generate and apply…. • JERM template • JERM extractor for data host • Subset registered in SEEK • Access / export through JERM interface / template 3
For publishing • JERM data needs to be related to SOPs, experimental context and other data • JERM must be “MIBBI” compliant for exporting to public repositories • e.g. Microarray data needs to be MIAME compliant
CIMRCore Information for Metabolomics Reporting MIABEMinimal Information About a Bioactive Entity MIACAMinimal Information About a Cellular Assay MIAMEMinimum Information About a Microarray Experiment MIAME/EnvMIAME / Environmental transcriptomic experiment MIAME/NutrMIAME / Nutrigenomics MIAME/PlantMIAME / Plant transcriptomics MIAME/ToxMIAME / Toxicogenomics MIAPAMinimum Information About a Phylogenetic Analysis MIAPARMinimum Information About a Protein Affinity Reagent MIAPEMinimum Information About a Proteomics Experiment MIAREMinimum Information About a RNAi Experiment MIASEMinimum Information About a Simulation Experiment MIENSMinimum Information about an ENvironmental Sequence MIFlowCytMinimum Information for a FlowCytometry Experiment MIGenMinimum Information about a Genotyping Experiment MIGSMinimum Information about a Genome Sequence MIMIxMinimum Information about a Molecular Interaction Experiment MIMPPMinimal Information for Mouse Phenotyping Procedures MINIMinimum Information about a Neuroscience Investigation MINIMESSMinimal Metagenome Sequence Analysis Standard MINSEQEMinimum Information about a high-throughput SeQuencing Experiment MIPFEMinimal Information for Protein Functional Evaluation MIQASMinimal Information for QTLs and Association Studies MIqPCRMinimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMinimal Information Required In the Annotation of biochemical Models MISFISHIEMinimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDAStandards for Reporting Enzymology Data TBCTox Biology Checklist BioPAX : Biological Pathways Exchangehttp://www.biopax.org/ FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions Minimum Information Models http://www.mibbi.org/index.php/MIBBI_portal
ISA-TAB • Relating data and its experimental context • Investigation, Study, Assay • TAB = tabular • A format suitable for spreadsheets
“assists in the reporting and local management of experimental metadata (i.e. sample characteristics, technologies used, type of measurements) from studies employing one or a combination of technologies • facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies” • Originally developed for multiple ‘omics data
RETRIEVAL Current situation @ EBI MGED standards HUPO-PSI standards NO common representation of complex studies Independent databases, different metadata representation, format, diverse terminologies etc. Pride ArrayExpress Transcriptomics data files + required experimental descriptors Proteomics data files + required experimental descriptors STORAGE Existing production systems MIAMExpress Mage TAB Proteome Harvest Mage-ML PSI-XML(s) SUBMISSION
ISA Provides.... • A common framework for describing how your data relates to its experimental context • A common framework for relating different types of data
ISA Provides • Cross walking between the Omics data stores • Relating microarrays and proteomics etc if they are part of the same study • Providing a single mechanism for submission to multiple data silos
ISA Defined • Investigation: high level description of the area and the main aims of a project • Study: a particular biological hypothesis or analysis • Assay: specific, individual experiments required to be undertaken together in order to address the study hypotheses
ISA in SysMO • Investigation: main aims of SysMO projects • Analysis of Central Carbon Metabolism of Sulfolobus solfataricus under varying temperatures • Study: a collection of experiments designed to answer a particular biological question • Comparison of S. solfataricus grown at 70 and 80 degrees • Assay: individual experiments in the study • Comparison of transcriptome 70 and 80c (Cdna microarray) • Comparison of proteome at 70 and 80c (Protein expression profiling) • Enzyme activity tests for s. solfataricus (Assay types) • Intracellular metabolomics of s. solfataricus at 70 and 80c (Metabolomics)
ISA in SysMO • Assays linked to data files • Data files linked together • Assays and data files linked to protocols and SOPs • ISA data is available to all in consortium • Data files and SOPs may be shared or kept private
Advantages • A common structure across consortium • Can be bundled together with data files to produce a common export format • Allows automated submission to public omics stores • ArrayExpress, Pride etc • Requires SysMO consortium members to only record metadata once
SEEK + JERM JERM Experimental Data Metadata People Investigation Homogenised terminology and values in the datasets themselves Study Projects Assay Models Experimental conditions SOPs Factors studied Workflows Based on ISA-TAB
Acknowledgements • SysMO-DB Team • SysMO-PALS • myGrid, EML and JWS Online teams • OMII-UK, Uni Southampton • EMBL-EBI, MCISB