110 likes | 128 Views
This document discusses the ArrayExpress production experience, including data acquisition, validation, extension, and downloads. It also covers the long-term future of ArrayExpress and provides a tutorial on submitting in MAGE-TAB format.
E N D
MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD
Content • All change at ArrayExpress • Data acquisition • Validation • Extension • Downloads • Long Term Future • Tutorial – submitting in MAGETAB format
MAGEML MAGEML MAGEML AE M.EXPRESS MAGEML MAGETABULATOR MAGETAB MIGRATION Tracking MAGETAB M.EXPRESS AE2 MAGETAB MAGETABULATOR
Data acquisition • MAGETAB data acquisition is integrated with existing tab2mage submissions • MAGETAB export is being added to the MIAMExpress system • All MAGE-ML submissions will be converted to MAGETAB • We will unify data acquisition on MAGETAB • We decided to do most curation/validation/ontology matching at the end for MAGETAB submissions • MAGETAB makes curator edit and user update much easier • Human readable tab delimited formats=efficient curation • 1600 Experiments processed (1600/3700) • All curated • Subset of ArrayExpress MAGETAB data will be re-curated at migration
Automated processing and validation • Sections • MAGETAB Column Headers • MAGTAB Column Orders • MAGETAB Content – length, terms • External data files – released monthly • vs. ArrayExpress content • MIAME score • DW candidates
Extensibility • Solexa data • Proteomics • Metabolomics • Array Genotype data (Gen2Phen) • Association study data (Gen2Phen, Engage) • Locus specific SNP data • Clinical Data • …..
Downloads • All ArrayExpress data will be available in MAGETAB format now (exported direct from AE) • ~90% is currently available and passes checks (issues with MAGE-OM->MAGETAB) • More ontology term sources will be added incrementally – NCI thesaurus/OBI/ArrayExpress Factor Ontology • Beta MAGETAB ArrayExpress Bioconductor Module (Huber, Kauffman) • All MAGETAB generation code is available • All validation code is available
Ontologies • Working to develop OBI to replace MGED ontology • Generating a sample/factor ontology for ArrayExpress based on data content • Developed in Protégé/OWL format • Will be served from OLS • Also mapping to external ontologies for samples e.g NCI thesaurus • Text mining to annotate external data using dictionaries based on NCI thesaurus and some custom ones (GEOimporter, tab2mage->MAGETAB) • Data import, meta analysis
Future: ArrayExpress and Community • ArrayExpress Submission in MAGETAB ADF format • All ArrayExpress ADF in MAGETAB format • Alpha ArrayExpress-MAGETAB BioConductor MAGETAB importer • AE2 • AE2 data migration • More people post their MAGETAB examples and we agree on a gold std validated set for typical cases • Community lists of MAGETAB supportive tools where people can register their interests and describe their applications (like GO tools) • Addressing HLA • MAGETAB model, firm up the spec • Decide what factors really are, and whether the MAGE case is still valid – controlled vs uncontrolled variables instead? • Issues with global variables - inter experiment comparison of compounds needs to know dose even if dose doesn’t vary in an experiment
Acknowledgments • Anna Farne • Ele Holloway • James Malone • Margus Lukk ArrayExpress Production Team • Helen Parkinson • Tim Rayner • Faisal Rezwan • Eleanor Williams • Mengyao Zhao • Holly Zheng • Mohammad Shojatalab ArrayExpress Development Team • Funding EC - FELICS, EMERALD, Gen2Phen, MUGEN NIH - MAGE grant
Tutorial • Creation of MAGETAB templates • Completion of a pre-made template • Curation • Scoring and validation templates • Viewing Data in ArrayExpress • Backend of the template generation/tracking system • www.ebi.ac.uk/~parkinso/MAGETAB_tutorial/