140 likes | 162 Views
Bottom-Up Proteomics Data collection. Ruedi Aebersold, Ph.D Institute of Molecular Systems Biology, ETH-Zürich; Faculty of Science, University of Zürich. The proteome: The ensemble of all biochemical reactions. Steps of Bottom-up proteomics. protein identifications. protein sample.
E N D
Bottom-Up ProteomicsData collection Ruedi Aebersold, Ph.D Institute of Molecular Systems Biology, ETH-Zürich; Faculty of Science, University of Zürich
Steps of Bottom-up proteomics protein identifications protein sample Database Protein level A B C D A B C Peptide grouping/ validation enzymatic digestion Peptide level Quantitation Validation Database search peptide mixture peptide identifications LC/MS/MS MS/MS spectrum level MS/MS spectra Protein Inference assumptions?? -- many possible, none is right
The proteome as seen by a mass spectrometer: Possibly:10exp6- 10exp8 features m/z 1200 1100 1000 900 800 700 600 500 400 min 0 10 20 30 40 50 60 70 80 90 100 110
The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome Detect dynamic changes in proteome following external or internal perturbations - Proteome as Database -Analytic chemistry slant - Proteomics as Biol. or clin. Assay - Biology slant Proteome analyzed once Proteome analyzed multiple (infinite) times Proteomics: Haynes P, Gygi S, Figeys D, and Aebersold R. (1998) Proteome analysis: biological assay or data archive? Electrophoresis 19:1862-1871
The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome Detect dynamic changes in proteome following external or internal perturbations Proteome as database: Proteomics as Biol. or clin. assay: Proteome analyzed once Proteome analyzed multiple (infinite) times Proteomics:
Human PeptideAtlas 2013-2015 14,274 13,230 +34 +3 +4 +110 +377 +516 2013 2014 2015 2013 2014 2015 2013 2013
Human PeptideAtlas 2013-2015 14,274 13,230 +34 +3 +4 +110 +377 +516 True new identifications or statistical noise? 2013 2014 2015 2013 2014 2015 2013 2013
Open questions re: Proteome catalogue… • When have we reached an endpoint in proteome cataloguing? • Why do we reach apparent saturation before hitting all predicted ORF’s? • What are relevant endpoints? (one representative per ORF?, all proteoforms? Other?) • How do we quantify proteins? • How do errors propagate in large datasets and how do we control FDR at peptide and protein level? • How do we best complete the catalogue? • What (biology) can we learn from the (complete) catalogue?
The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome Detect dynamic changes in proteome following external or internal perturbations Proteome as database: Proteomics as Biol. or clin. assay: Proteome analyzed once Proteome analyzed multiple (infinite) times Proteomics:
Data and the scientific method Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Data matrix supporting analyses via association • Accurately and reproducibly quantifyproteotypes across samples and conditions • Conditions: • Clinical cohorts • Time courses • Dosage courses • Samples with structured genomes Conditions 1-n Proteins 1--n
Open questions re: Association studies • How many proteins are enough? • Which ones? • How precisely do proteins need to be quantified? • Which peptides are best suited to quantify a protein? • Should proteins be considered as independent actors (like transcripts) or as parts of modules? • What factors affect protein modules and how? • How do errors propagate in large datasets and how do we control FDR at peptide and protein level? • Are data reliable, robust and accessible enough? • Data integration, dissemination of methods.