Protein Sequencing and Identification by Mass Spectrometry

1. Protein Sequencing and Identification by Mass Spectrometry

2. Masses of Amino Acid Residues

3. Protein Backbone

4. Peptide Fragmentation Peptides tend to fragment along the backbone. Fragments can also loose neutral chemical groups like NH3 and H2O.

5. Breaking Protein into Peptides and Peptides into Fragment Ions Proteases, e.g. trypsin, break protein into peptides. A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. Mass Spectrometer measure mass/charge ratio of an ion.

6. N- and C-terminal Peptides

7. Terminal peptides and ion types



10. N- and C-terminal Peptides

11. N- and C-terminal Peptides

12. Peptide Fragmentation

13. Mass Spectra The peaks in the mass spectrum: Prefix Fragments with neutral losses (-H2O, -NH3) Noise and missing peaks.

14. Protein Identification with MS/MS

15. Tandem Mass-Spectrometry

16. Breaking Proteins into Peptides

17. Mass Spectrometry

18. Tandem Mass Spectrometry

19. Protein Identification by Tandem Mass Spectrometry

20. Tandem Mass Spectrum Tandem Mass Spectrometry (MS/MS): mainly generates partial N- and C-terminal peptides Spectrum consists of different ion types because peptides can be broken in several places. Chemical noise often complicates the spectrum. Represented in 2-D: mass/charge axis vs. intensity axis

21. De Novo vs. Database Search

22. De Novo vs. Database Search: A Paradox The database of all peptides is huge � O(20n) . The database of all known peptides is much smaller � O(108). However, de novo algorithms can be much faster, even though their search space is much larger! A database search scans all peptides in the database of all known peptides search space to find best one. De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.

23. De novo Peptide Sequencing

24. Theoretical Spectrum

25. Theoretical Spectrum (cont�d)

26. Theoretical Spectrum (cont�d)

27. Building Spectrum Graph How to create vertices (from masses) How to create edges (from mass differences) How to score paths How to find best path

28. S E Q U E N C E

29.

31.

32.

34.

35. MS/MS Spectrum

36. Some Mass Differences between Peaks Correspond to Amino Acids

37. Ion Types Some masses correspond to fragment ions, others are just random noise Knowing ion types ?={d1, d2,�, dk} lets us distinguish fragment ions from noise We can learn ion types di and their probabilities qi by analyzing a large test sample of annotated spectra.

38. Example of Ion Type ?={d1, d2,�, dk} Ion types {b, b-NH3, b-H2O} correspond to ?={0, 17, 18} *Note: In reality the d value of ion type b is -1 but we will �hide� it for the sake of simplicity

39. Match between Spectra and the Shared Peak Count The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks Match between experimental and theoretical spectra is defined similarly

40. Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: S: experimental spectrum ?: set of possible ion types m: parent mass Output: P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best

41. Vertices of Spectrum Graph Masses of potential N-terminal peptides Vertices are generated by reverse shifts corresponding to ion types ?={d1, d2,�, dk} Every N-terminal peptide can generate up to k ions m-d1, m-d2, �, m-dk Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+d1, s+d2, �, s+dk} corresponding to potential N-terminal peptides Vertices of the spectrum graph: {initial vertex}?V(s1) ?V(s2) ?... ?V(sm) ?{terminal vertex}

42. Reverse Shifts Two peaks b-H2O and b are given by the Mass Spectrum With a +H2O shift, if two peaks coincide that is a possible vertex.

43. Reverse Shifts

44. Edges of Spectrum Graph Two vertices with mass difference corresponding to an amino acid A: Connect with an edge labeled by A Gap edges for di- and tri-peptides

45. Paths Path in the labeled graph spell out amino acid sequences There are many paths, how to find the correct one? We need scoring to evaluate paths

46. Path Score p(P,S) = probability that peptide P produces spectrum S= {s1,s2,�sq} p(P, s) = the probability that peptide P generates a peak s Scoring = computing probabilities p(P,S) = ps?S p(P, s)

47. For a position t that represents ion type dj : qj, if peak is generated at t p(P,st) = 1-qj , otherwise Peak Score

48. Peak Score (cont�d) For a position t that is not associated with an ion type: qR , if peak is generated at t pR(P,st) = 1-qR , otherwise qR = the probability of a noisy peak that does not correspond to any ion type

49. Finding Optimal Paths in the Spectrum Graph For a given MS/MS spectrum S, find a peptide P� maximizing p(P,S) over all possible peptides P: Peptides = paths in the spectrum graph P� = the optimal path in the spectrum graph

50. Ions and Probabilities Tandem mass spectrometry is characterized by a set of ion types {d1,d2,..,dk} and their probabilities {q1,...,qk} di-ions of a partial peptide are produced independently with probabilities qi

51. Ions and Probabilities A peptide has all k peaks with probability and no peaks with probability A peptide also produces a ``random noise'' with uniform probability qR in any position.

52. Ratio Test Scoring for Partial Peptides Incorporates premiums for observed ions and penalties for missing ions. Example: for k=4, assume that for a partial peptide P� we only see ions d1,d2,d4. The score is calculated as:

53. Scoring Peptides T- set of all positions. Ti={t d1,, t d2,..., ,t dk,}- set of positions that represent ions of partial peptides Pi. A peak at position tdj is generated with probability qj. R=T- U Ti - set of positions that are not associated with any partial peptides (noise).

54. Probabilistic Model For a position t dj ? Ti the probability p(t, P,S) that peptide P produces a peak at position t. Similarly, for t?R, the probability that P produces a random noise peak at t is:

55. Probabilistic Score For a peptide P with n amino acids, the score for the whole peptides is expressed by the following ratio test:

56. De Novo vs. Database Search

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry

Presentation Transcript

Statistical Significance for Peptide Identification by Tandem Mass Spectrometry

Mass Spectrometry for Protein Quantification and Identification of Posttranslational Modifications

Protein Sequencing and Identification by Mass Spectrometry

Identification of Historical Colourants by Mass Spectrometry Volodymyr Pauk

Infrared spectrometry and mass spectrometry...

Mass Spectrometry

Protein Identification and Peptide Sequencing by Liquid Chromatography – Mass Spectrometry

Protein Sequencing and Identification

Using mass spectrometry for protein-protein interaction studies

Proteomics: Protein Profiling and Identification through Mass Spectrometry

Peptide Sequencing by Mass Spectrometry

Identification of Protein Free Radicals Using Mass Spectrometry Leesa J. Deterding

Statistical Significance for Peptide Identification by Tandem Mass Spectrometry

Protein Identification Using Tandem Mass Spectrometry

Mass Spectrometry-Based Methods for Protein Identification

Analysis of Protein Complexes by Mass Spectrometry

Protein sequencing and Mass Spectrometry

Introduction to mass spectrometry-based protein identification and quantification

PROTEIN IDENTIFICATION BY MASS SPECTROMETRY

host cell protein analysis mass spectrometry

Analysis of Protein Complexes by Mass Spectrometry

Peptide Sequencing by Mass Spectrometry