140 likes | 256 Views
PSI-Proteome Informatics update. Andy Jones PSI 2013 Liverpool. PSI-PI outputs. Formats and guidelines for proteome informatics Standard formats: mzIdentML mzQuantML mzTab Reporting guidelines MIAPE MSI MIAPE Quant. MIAPE documents. Originally one MIAPE document:
E N D
PSI-Proteome Informatics update Andy Jones PSI 2013 Liverpool
PSI-PI outputs • Formats and guidelines for proteome informatics • Standard formats: • mzIdentML • mzQuantML • mzTab • Reporting guidelines • MIAPE MSI • MIAPE Quant
MIAPE documents • Originally one MIAPE document: • MIAPE Mass spectrometry information (MSI) containing both identification guidelines and quant guidelines • MIAPE MSI (ident only) and MIAPE Quant • MIAPE MSI status • MIAPE MSI 1.1 published back in 2008 • Working group 2011-2012 minor updates to requirements and removal of quant parts • MIAPE MSI 1.2 still needs to be re-submitted to PSI process • Plan for meeting: • Final issues with MIAPE MSI and alignment with mzIdentML?
MIAPE Quanttimeline • Work started on Dec 2010 by ProteoRed groups of experts • Shared with PSI working groups in March 2011 • Revision at PSI meeting (Heidelberg) April 2011 • PSI review: • Public and external review ended on August 2012 • Major revision accepted on October 2012 • Journal of Proteomics: • Submitted on 15th February • Accepted on 27th February after minor revision Martínez-Bartolomé, S., Deutsch, E. W., Binz, P.-A., Jones, A. R., Eisenacher, M., Mayer, G., Campos, A., Canals, F., Bech-Serra, J.-J., Carrascal, M., Gay, M., Paradela, A., Navajas, R., Marcilla, M., Hernáez, M. L., Gutiérrez-Blázquez, M. D., Velarde, L. F. C., Aloria, K., Beaskoetxea, J., Medina-Aunon, J. A., and Albar, J. P. Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. Journal of Proteomics, 2013 in press.http://www.sciencedirect.com/science/article/pii/S1874391913001024 No planned work for meeting
mzIdentML • Timeline: • Original 1.0 version in Aug 2009 • Version 1.1 stable (Aug 2011) • Manuscript published in MCP in 2012 • PSI 2013 To do list: • Updates to protein grouping • PTM localisation / ambiguity scoring • General discussion of data compression issues • Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., Siepen, J., Hubbard, S., Selley, J., Searle, B., Shofstahl, J., Seymour, S., Julian, R., Binz, P.-A., Deutsch, E. W., Hermjakob, H., Reisinger, F., Griss, J., Vizcaino, J. A., Chambers, M., Pizarro, A., and Creasy, D. (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 11, M111.014381.
Formats • mzQuantML • Output of quantitative software • Quantitative values about proteins, protein groups, peptides and features (quantified regions on mass spec) also small molecules... • Relative or absolute values for single samples (Assays) or groups of replicates (StudyVariables)
mzQuantML status • Version 1.0 rc-1 submitted to the PSI process October 2011 • Version 1.0 rc-2 June 2012 • Re-submitted to PSI process in October 2012 & manuscript submitted to MCP, minor correction received Completed PSI process in Feb 2013 – version 1.0 release • Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques (e.g. iTRAQ) and MS1 label techniques e.g. SILAC • Schema is fixed with each technique defined by separate semantic rules, implemented in validator software • Manuscript re-submitted to MCP, awaiting outcome Implementations • Java API for creating example files (version 1.0 release): http://code.google.com/p/jmzquantml/ • Java-based validator (version 1.0 release): http://code.google.com/p/mzquantml-validator/ • Software for converting output files from MaxQuant and Progenesis: • Qi, D et al. OMICS 16(9): 489-495 ; http://code.google.com/p/maxquant-mzquantml-convertor/ • Implementation in OpenMS for some techniques • Beta Java library of routines inc. mzTab exporter: http://code.google.com/p/mzq-lib/ • Beta Excel to mzQuantML converter for spectral count data: http://code.google.com/p/tsv-or-csv-mzquantml-converter/ Mzq To do list: • Need to add SRM support • Local testing of SRM encoding and conversion from Skyline • Need wider input on our mapping and writing semantic rules for software • Need to check whether protein grouping and mod scoring map onto format okay
mzTab • To provide a simple and efficient way of exchanging results from MS approaches. • Simple summary “final” report of the experimental results; Peptides and proteins identified and quantified • Small molecules included (metabolomics) • Technical and biological metadata • Spectra can be referenced in optional columns. • Set of mandatory and optional attributes (very flexible). • Four sections: • (Optional) Metadata section • (Optional) Protein section • (Optional) Peptide section • (Optional) Small Molecule section (metabolomics) • Can report MS derived data at different levels: • Single experiments • Multiple (possibly linked) experiments (merged files) • Data generated as a result of a query to a bioinformatics resource • Possible to add a reliability score for each identification • Easy to parse and use by the research community, systems biologists as well as providers of knowledge bases. • It can be used by non-experts in bioinformatics and/or proteomics. http://mztab.googlecode.com
mzTab status • Submitted to the PSI document process on May 2012. • TO DO: Addressing now the remaining (minor) comments after the second round of review. • So, we hope that version 1.0 will soon be formalised. • Publication (revised version) under review in MCP. Current implementations: • jmzTab (Java API): 2 versions have been developed. Version 2.0 (Q.W. Xu, about to be finished, with more functionality) is going to be the maintained version. Version 1.0 (J. Griss) will not be further maintained. • mzTabValidator, PRIDE XML to mzTab converter and mzTab merger in beta status. • PRIDE Converter 2. • OpenMS (version 1.10) • R/Bioconductor package Msnbase(L. Gatto, Cambridge University) • LipidDataAnalyzer (University of Graz) • Metabolights (EBI) and COSMOS EU project: A slightly modified version is being used right now. Working in contact with them.
PSI-PI work done • mzIdentML • Minor schema issues: • Optional attribute (Dbsequence_Ref) on ProteinDetectionHypothesis (would be better if mandatory) • update spec doc encouraging best practice • Pre-fractionation: • update spec doc encouraging best practice – one SpectrumIdentificationList where possible • Retention time reporting: • Update spec doc encouraging best practice; align with mzML CVs • Support for Crosslinking results • Sketched a possible reporting format that looks to cover most simple cases • Needs (considerable) further testing in local implementations and follow up by calls • Mod localisation • Sketched some possible encodings • Needs follow up calls and implementation in software • Keen to build this support into mzid 1.1 but model is going to be a work-around. • Protein grouping • Reported back current progress of working group (key members not present here) • New members will join the working group
PSI-PI work done • mzQuantML • Sketched SRM example files for label-based encoding • Need sketched example for label-free but seems straightforward • Plan to build export software very soon from Skyline (prototype already done) and mProphet • Write up semantic encoding rules • Submit to PSI doc process as a Community Practice document
PSI-PI work done mzTab • Finalised minor issues from second round of PSI doc process review • Deadline 1st May for re-submitting final “release 1.0” document • Needs minor updates to document and example files