1 / 92

Paulo Costa Carvalho Laboratory for Proteomics and Protein Engineering Fiocruz - PR

Analyzing shotgun proteomic data. Paulo Costa Carvalho Laboratory for Proteomics and Protein Engineering Fiocruz - PR. pcarvalho.com. Outline. Shotgun proteomics Motivation for studying proteomics. What is shotgun proteomics. Data analysis Protein identification

urit
Download Presentation

Paulo Costa Carvalho Laboratory for Proteomics and Protein Engineering Fiocruz - PR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing shotgun proteomic data Paulo Costa Carvalho Laboratory for Proteomics and Protein Engineering Fiocruz - PR pcarvalho.com

  2. Outline • Shotgun proteomics • Motivation for studying proteomics. • What is shotgun proteomics. • Data analysis • Protein identification • Label-free quantitation • PatternLab for proteomics • Final Considerations

  3. Motivations J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: 10.1021/pr100677g

  4. Computational Proteomics Editorial “There has been an unprecedented improvement in the quality and quantity of commercial proteomics data generation technologies, making data generation more accessible to many researchers. However, more and more discoveries will be led by researchers in command of the skills necessary to mine and extensively interpret the volumes of data. Already the ability to generate data vastly outpaces our ability to interpret it, and the lack of expertise in interpreting data is the current gating factor in the advancement of proteomics sciences. Proteomics scientists with training solely in data generation techniques will be shut out of more and more research opportunities. NunoBandeira, July 2011

  5. Too many roads not taken Eduards AM, Nature, Feb 2011

  6. Outline • Shotgun proteomics • Motivation for studying proteomics. • What is shotgun proteomics. • Data analysis • Protein identification • Label-free quantitation • PatternLab for proteomics • Final Considerations

  7. Proteomics has revolutionized biochemical research

  8. pcarvalho.com

  9. LC / MS shotgun proteomic data Time Mass / Charge

  10. (B) (Y) NH2 COOH A F Y L A K (precursor)2+ A F Y L K m/z

  11. (B) (Y) NH2 COOH A F Y L A K (precursor)2+ Y L K A F A F Y L K m/z

  12. (B) (Y) NH2 COOH A F Y L K Y L K (precursor)2+ L K A F A A F Y F Y L K m/z

  13. (B) (Y) NH2 COOH A F Y L K Y L K (precursor)2+ L K A F Y L K A F A A F Y F Y L K m/z

  14. Outline • Shotgun proteomics • Motivation for studying proteomics. • What is shotgun proteomics. • Data analysis • Protein identification • Label-free quantitation • PatternLab for proteomics • Final Considerations

  15. Strategies for protein identification by mass spectrometry • Peptide sequence match • Advantage: most sensitive (when the protein is in the DB) • Disadvantage: sequence must be in the DB; needs to specify PTMs a priori. • De novo sequencing • Advantage: does not require a database • Disadvantage: most error prone. • Sequence Tag Search • Advantages: no need to specify PTM a priori; tolerant to small changes in the sequence • Disadvantages: not as sensitive as PSM when the protein is in the DB

  16. De novo sequencing • Advantage: does not require a database • Disadvantage: most error prone MS/MS Intensity M/Z A L T H P V T E G G K E F S I L L V E Q D S G V K S D I G V V A

  17. Sequence Tag Search • Advantages: no need to specify PTM a priori; tolerant to small sequence changes • Disadvantages: not as sensitive as PSM when the protein is in the DB Na S et al., MCP, 2008

  18. Peptide sequence match • Advantage: most sensitive (when the protein is in the DB) • Disadvantage: sequence must be in the DB; needs to specify PTMs a priori

  19. Protein Identification using a database ProLuCID Xtandem OMSSA Andromeda SEQUEST Mascot …

  20. Interpreting MS/MS Proteomics Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Brian.Searle@ProteomeSoftware.com NPC Progress Meeting (February 2nd, 2006) Illustrated by Toni Boudreault

  21. All these peaks are seen together simultaneously and we don’t even know… B-type,A-type,Y-type Ions R I T P E A H2O Intensity M/z

  22. What type of ion they are, making the mass differences approach even more difficult. Finally, as with all analytical techniques, Intensity M/z

  23. There’s noise, producing a final spectrum that looks like… Intensity M/z

  24. And so it’s actually fairly difficult to… ….This, on a good day. Intensity M/z

  25. XCalibur :: Show experimental data

  26. B-type ions A-type ions Y-type ions Known Ion Types We knew a couple of things about peptide fragmentation. Not only do we know to expect B, A, and Y ions, but…

  27. B-type ions A-type ions Y-type ions B- or Y-type +2H ions B- or Y-type -NH3 ions B- or Y-type -H2O ions 100% 20% 100% 50% 20% 20% … likelihood of seeing each type of ion, Known Ion Types where generally B and Y ions are most prominent.

  28. So it’s actually pretty easy to guess what a spectrum should look like If we know the amino acid sequence of a peptide,we can guess what the spectra should look like! if we know what the peptide sequence is.

  29. Model Spectrum So as an example, consider the peptide ELVIS LIVES K ELVISLIVESK that was synthesized by Rich Johnson in Seattle *Courtesy of Dr. Richard Johnson http://www.hairyfatguy.com/

  30. Model Spectrum We can create a hypothetical spectrum based on our rules

  31. B/Y type ions (100%) Where B and Y ions are estimated at 100%, plus 2 ions are estimated at 50%, and other stragglers are at 20%. B/Y +2H type ions (50%) A type ions B/Y -NH3/-H2O (20%)

  32. Model Spectrum So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…

  33. Model Spectrum We can find where the overlap is between the hypothetical and the actual spectra…

  34. Model Spectrum And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.

  35. 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced. 1989 Yeast Genome project announced 1990 Human Genome project announced 1992 First chromosome (Yeast) sequenced 1995 H. influenza sequenced 1996 Yeast Genome sequenced 2000 Human Genome draft Sequencing Explosion Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. … In 1994 Jimmy Eng and John Yates published a technique to exploit genome sequencing for use in tandem mass spectrometry. And the idea was …

  36. SEQUEST .…instead of searching all possible peptide sequences, Now, in the post- genomic world this seems like a pretty trivial idea, search only those in genome databases. but back then there was a lot of assumption placed on the idea that we’d actually have a complete Human genome in a reasonable amount of time.

  37. For a scoring function they decided to use Cross-Correlation, Like so. which basically sums the peaks that overlap between hypothetical and the actual spectra SEQUEST Model Spectrum

  38. And then they shifted the spectra back and …. SEQUEST Model Spectrum

  39. … Forth so that the peaks shouldn’t align. They used this number, also called the Auto-Correlation, as their background. SEQUEST Model Spectrum

  40. SEQUEST XCorr This is another representation of the Cross Correlation and the Auto Correlation. Cross Correlation (direct comparison) Auto Correlation (background) Correlation Score Offset (AMU) Gentzel M. et al Proteomics3 (2003) 1597-1610

  41. The XCorr score is the Cross Correlation divided by the average of the auto correlation over a 150 AMU range. SEQUEST XCorr The XCorr is high if the direct comparison is significantly greater than the background, Cross Correlation (direct comparison) which is obviously good for peptide identification. Auto Correlation (background) Correlation Score Offset (AMU) XCorr = Gentzel M. et al Proteomics3 (2003) 1597-1610

  42. SEQUEST DeltaCn And this XCorr is actually a pretty robust method for estimating how accurate the match is, and so far, there really haven’t been any significant improvements on it. The DeltaCn is another score that scientists often use. It measures how good the XCorr is relative to the next best match. As you can see, this is actually a pretty crude calculation.

  43. Raw Xtractor/ Pause for search * Show an MS2 file

  44. ProLuCID ProLuCID is a fast and sensitive tandem mass spectra-based protein identification program recently developed in the Yates laboratory at The Scripps Research Institute.

  45. ProLuCID runner Show ProLuCID Runner Carvalho PC et al; unpublished

  46. Protein Identification Search Engine (e.g. ProLuCID, SEQUEST, etc) MS PSM Workflow Database

  47. The Challenge: How to pinpoint trustworthy identifications 1 spectrum = 1 identification!

  48. Filtering data

  49. In the beginning… Spectra were sorted according to some score and then a threshold value was set. Different programs have different scoring schemes, so SEQUEST, Mascot, and X!Tandem use different thresholds. Different thresholds may also be needed for different charge states, sample complexity, and database size. SEQUEST XCorr > 2.5 dCn > 0.1 Mascot Score > 45 X!Tandem Score < 0.01 sort by match score spectrum scores protein peptide

More Related