120 likes | 294 Views
mzTab. Proposal for A Simple Data Format for Proteomics Results. Current Situation. The necessity of standard data formats has become generally accepted Proteomics techniques are constantly evolving Proposed standard formats had to become very complex to adequately capture proteomics data
E N D
mzTab Proposal for A Simple Data Format for Proteomics Results
Current Situation • The necessity of standard data formats has become generally accepted • Proteomics techniques are constantly evolving • Proposed standard formats had to become very complex to adequately capture proteomics data • mzIdentML for identification data • mzQuantML for quantitative data • An effective use of these data formats requires sophisticated bioinformatic knowledge • Many researchers are still used to use MS Excel to “look” at their data
Communication of Proteomics Results • Proteomics resources require a mechanism to simply/efficiently exchange basic proteomics results • Collaboration with colleagues from other scientific fields is increasingly important • Necessity to share proteomics results with researchers outside of proteomics • Need to make proteomics data easily accessible
Potential Current Problems • Currently proposed standard formats are difficult to use without the JAVA APIs • “Complete” standard formats are too complex and big to quickly share the essential results • Quick, f.e. Perl scripts for specific research questions are not easily possible • Large amount of potential innovation could be lost • Reading files requires special software • Further processing of the data (f.e. with statistical) tools is not easily possible • No standard tools to read / write mz*ML files available • Custom built software required for many use cases otherwise fulfilled by “Excel & friends”
mzTab - Aim • To provide a simple and efficient way of exchanging proteomics data • Which protein / peptide was identified in a given experimental setting • Easy to update and maintain • Easy to use by the proteomics community, systems biologists as well as providers of knowledge bases
mzTab – Target Audience • Proteomics repositories (f.e. PRIDE, PeptideAtlas) • Knowledge base resources (f.e. UniProt, HPRD) • Researchers outside of proteomics • Researchers analyzing proteomics data with limited bioinformatic knowledge / support
mzTab – proposed concept • A tab-delimited file format • Goals • Content should be “readable” using MS Excel • Should contain minimal information for proteomics repositories / knowledge bases to exchange data • Data should be easily accessible using f.e. scripting languages • One file should be able to contain multiple experiments / proteins from different resources • Aim: To represent the result of a query to f.e. PRIDE using this format • Provide a simplisitic summary of proteomics results • Every entry contains a reference to the source data (in mzIdentML / mzQuantML format)
mzTab – proposed concept • What the format does NOT aim at: • Replace mzIdentML or mzQuantML • Contain the complete data of a proteomics experiment • Provide detailed evidence for the data • Allow a researcher to recreate the process which led to the results • Be requirements conform (MIAPE, journal guidelines, etc.) • In short: be complete in any way
mzTab – Possible Format Specification • Three sections • (Optional) Metdata section • (Required) Protein section • (Optional) Peptide section • Can report proteomics data at different levels • Single experiments • Multiple (possibly linked) experiments • Data generated as a result to a query (possibly to multiple resources)
mzTab – Metadata Section ----metadata PRIDE_16649-title: The Synaptic Proteome during Development and Plasticity of the Mouse Visual Cortex PRIDE_16649-species: [NEWT, 10090, Mouse,] PRIDE_16649-tissue: [EFO, EFO:0000916, visual cortex,] PRIDE_16649-instrument[1]-type: [MS, MS:1000287, TOF-MS,] PRIDE_16649-search_engine: [MS, MS:1001207, Mascot, ] PRIDE_16649-contact[1]-name: August B Smit PRIDE_16649-contact[1]-email: guus.smit@cncr.vu.nl PRIDE_16649-url: http://www.ebi.ac.uk/pride/q.do?accession=16649 ----END
mzTab – Protein Section ----proteins Accession … reliability peptides … ambiguity_members P12345 4 2 P12346,P123457 … ´----END • A Table holding the basic identification information • Suggestions of how to include • quantitative data • multiple search engine scores • ambiguous modification positions
mzTab – Peptide Table ----peptides sequence accession unit unique … reliability … DIIL O00160 PRIDE_3381 false 5 … VESVDL O00160 PRIDE_3381 true 4 … ----END • A Table holding the basic peptide information • Suggestions of how to include • quantitative data • multiple search engine scores • ambiguous modification positions