1 / 73

10. Standards in Proteomics

10. Standards in Proteomics. MS bioinformatics analysis for proteomics. Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid. Index. Need of standards in Proteomics HUPO-PSI Organization

Download Presentation

10. Standards in Proteomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé smartinez@proteored.org Bioinformatics support – ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid

  2. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  3. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  4. Need of standards in Proteomics • Proteomics data is often only made available as arbitrarily formatted PDF tables, carrying important limitations: • Source data (mass spectra) are not made available • • No peer review validation possible • • Very little raw materials for testing innovative in silico techniques are available • • Automated (re-)processing of the identifications is impossible (eliminating objective technique comparison)

  5. Thoughts in Standards • Bradshaw RA, Burlingame AL, Carr S, Aebersold R.Reporting protein identification data: the next generation of guidelines.Mol Cell Proteomics. 2006 May;5(5):787-8. • Wilkins et al. Guidelines for the next 10 years of proteomics.Proteomics. 2006 Jan;6(1):4-8. • Nature Biotechnology 2006, Nov: • Editorial: Standards Operating Procedures • Burgoon LD. The need for standards, not guidelines, in biological data reporting and sharing. • Ball C. Are we stuck in standards? • Nature Biotechnology: Planned focus issue and Community Consultation on Standards: http://www.nature.com/nbt/consult/index.html

  6. Need of standards in Proteomics • Proteomics: No standardized reporting, not standard database submission • Proteomics data is generated at a high rate, and lost at a high rate • Experiments are repeated unnecessarily, the field advances slower than necessary

  7. Need of standards in Proteomics • Standards for: • Store data • Review data • Reproduce results • Compare data • Exchange data

  8. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • CVs • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  9. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • CVs • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  10. HUPO PSI Protein Standard Initiative http://www.psidev.info

  11. HUPO PSI Protein Standard Initiative Meetings http://www.psidev.info

  12. HUPO PSI Protein Standard Initiative http://psidev.info The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification Proteomics 2003, 3 (7): The proteomics standards initiative. Orchard,S. , Hermjakob,H. , Apweiler,R. • Open community initiative • Develop data format standards • • Data representation and annotation standards • • Involve data producers, database providers, software producers, publishers

  13. HUPO PSI structure • Main unit is workgroup • Gel Electrophoresis • Molecular Interactions • Sample Processing • Mass spectrometry • Proteomic Informatics (MS oriented) • Protein Modifications • Transversal activities • One Steering Group • Controlled vocabulary • MIAPE guidelines

  14. HUPO PSI structure • No permanent funding, active members work on their “spare time” • Annual workshop, reporting activity at annual HUPO, conference calls, dedicated workshops • Website (http://psidev.info) and mailing-lists • PSI Document process • Vizcaino, J.A., Martens, L., Hermjakob, H., Julian, R.K. and Paton, N.W. (2007) The PSI formal document process and its implementation on the PSI website.Proteomics 7: 2355-2357. 

  15. HUPO PSI document process Community consultation at: http://www.nature.com/nbt/consult/

  16. HUPO PSI structure

  17. HUPO-PSI • Project status

  18. HUPO-PSI PSI deliverables • Formats (XML schema, instance docs, specification docs) • Controlled Vocabularies • MIAPE docs (representation and annotation standards) • Data formats • MIML • mzML • AnalysisXML • gelML • giML • spML • MIAPE minimal reporting requirements • One parent document - The minimum information about a proteomics experiment (MIAPE), Nature Biotechnology 25, 887-893 (2007) • MIAPE MI, MS, MSI, GE, GI, CC, CE, SP

  19. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • CVs • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  20. Standard data formats for Experimental data: spectra, acquisition parameters, acquisition equipment, ... Analyzed data: identifications, quantitations, data analysis software ...

  21. Standard data formats Experimental data: spectra, acquisition parameters, acquisition equipment, ... • data format capturing peak list information. • Its aim is to unite the large number of current formats (pkl's, dta's, mgf's, .....) into one • It is NOT a substitute for the rawfile formats of the instrument vendors. Some vendors, if not all, will provide software transforming their raw files to that standards mzXML 2.0 mzXML 3.0 mzXML 4.0 Seattle Proteome Center at the Institute for Systems Biology mzML 1.0 mzXML 1.05 mzXML 2.0 HUPO-PSI mzML: Released on June 1st, 2008

  22. Sample instance document mzML 1.0

  23. Standard data formats for Experimental data: spectra, acquisition parameters, acquisition equipment, ... Analyzed data: identifications, quantitations, data analysis software ...

  24. Standard data formats Analyzed data: identifications, quantitations, data analysis software ... • describes the results of identification and quantitation processes for proteins, peptides and protein modifications from mass spectrometry pepXML Seattle Proteome Center at the Institute for Systems Biology AnalysisXML protXML HUPO-PSI AnalysisXML: v1.0 – candidate (Dic 08)

  25. Sample instance document AnalysisXML (beta)

  26. Standard data formats Other data:

  27. Standard data formats proprie-tary format mass spectrometer A search engine A converter mzML analysisXML mass spectrometer B search engine B Public repository

  28. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • CVs • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  29. Controlled Vocabularies The Controlled Vocabularies (CVs) of the Proteomic Standard Initiative (PSI) provide a consensus annotation system to standardize the meaning, syntax and formalism of terms used across proteomics, as required by the PSI Working Groups. Each PSI working group develop the CVs required by the technology or data type it aims to standardize, following common recommendations for development and maintenance. At the PSI meeting in Washington (Sept 06), it was decided that all PSI working groups should adopt the same CVs standardizing some overlapping concepts (units and resources).

  30. Controlled Vocabularies Term Synonyms What is a CV? TOF T.O.F. 100173 time-of-flight time of flight

  31. Controlled Vocabularies • PSI CVs are composed of two documents: • a design principledescription • the implementation of the CVs in OBO(Open Biomedical Ontologies) • Developing CVs is a process of collecting, and if necessary defining terms. • Every effort must be made to adopt and re-use existing ontologies or CVs where they exist, to avoid “re-inventing the wheel”.

  32. Ontology Lookup Service http://www.ebi.ac.uk/ontology-lookup/ • The OLS provides a web service interface to query multiple ontologies from a single location with a unified output format.

  33. Ontology Lookup Service http://www.ebi.ac.uk/ontology-lookup/

  34. Index • Need of standards in Proteomics • HUPO-PSI • Organization • Standard data formats • CVs • MIAPEs • PEFF: A Common Sequence Database Format in Proteomics • PRIDE • Standard data format converters

  35. MIAPE: Minimum Information About a Proteomics Experiment Taylor, C.F., Paton, N.W., Lilley, K.S., Binz, P.A., Julian, R.K., Jr., Jones, A.R., Zhu, W., Apweiler, R., Aebersold, R., Deutsch, E.W., Dunn, M.J., Heck, A.J., Leitner, A., Macht, M., Mann, M., Martens, L., Neubert, T.A., Patterson, S.D., Ping, P., Seymour, S.L., Souda, P., Tsugita, A., Vandekerckhove, J., Vondriska, T.M., Whitelegge, J.P., Wilkins, M.R., Xenarios, I., Yates, J.R., 3rd and Hermjakob, H. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25: 887-893. • Sufficiency and practicability • • Unambiguous description of the experimental context • • Allow understanding of the results and their interpretation • • Sufficient to permit a critical evaluation • • In principle allow recreation of the work

  36. MIAPE guidelines • It is: • – Describing a list of information and data to provide when an experiment is reported (it is a content descriptor) • • Peptide sequence, scores, modifications, mass errors, etc. • – Helping to assess quality control • • Number of replicates, expected error rate

  37. MIAPE guidelines • • It is not: • – Describing the way to run an experiment • • does not specify the use of a search engine in particular • • does not force the use of one protocol • – Describing the data representation • • Use excel to create a table with these five following columns:… • – Including any quality judgment • • need 30% sequence coverage to identify a protein • • “The absence of thorough validation of both analytical and biological results, including error analysis should result in rejection” • • “Authors should justify the use of a very small database or database that excludes common contaminants, since this may generate misleading assignments”

  38. MIAPE guidelines • MIAPE Gel Electrophoresis (GE) v1.4 • MIAPE Gel Informatics (GI) v0.5 • MIAPE Mass Spectrometry (MS) v2.22 • MIAPE Mass Spectrometry Informatics (MSI) v0.8 • MIAPE Column Chromatography (CC) v1.0 • MIAPE Capillary Electrophoresis (CE) v0.7 • MIAPE Sample Preparation and handling (SP) v0.2 • MIAPE Molecular Interactions (MI) v1.1.2

  39. Online tooltogenerate and store MIAPE documents http://www.proteored.org

  40. A MIAPE generator tool Fill all minimal information by hand ProteoRed server Fill only some changes or new items by hand, and add automatically static information from previous MIAPE documents

  41. A MIAPE generator tool http://www.proteored.org

  42. A MIAPE generator tool

  43. A MIAPE generator tool

  44. A MIAPE generator tool

  45. A MIAPE generator tool

  46. HUPO-PSI: MIAPE Gel Electrophoresis v1.2

  47. Edit document Delete document Generate report Generate XML

  48. MIAPE Reports Generate report

More Related