260 likes | 471 Views
Protein sequencing and Mass Spectrometry. Enzymatic Digestion (Trypsin) +. Fractionation. Sample Preparation. Single Stage MS. Mass Spectrometry. LC-MS: 1 MS spectrum / second. Tandem MS. Secondary Fragmentation. Ionized parent peptide. The peptide backbone.
E N D
Enzymatic Digestion (Trypsin) + Fractionation Sample Preparation
Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second
Tandem MS Secondary Fragmentation Ionized parent peptide
The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei
Ionization The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei Ionized parent peptide
Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CONH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei AA residuei+1 Ionized peptide fragment
Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H]2+ 0 250 500 750 1000 m/z
Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 Peak assignment implies Sequence (Residue tag) Reconstruction! y7 % Intensity [M+2H]2+ y5 b3 b4 y2 y3 b5 y4 y8 b8 b9 b6 b7 y9 0 250 500 750 1000 m/z
Database Searching for peptide ID • For every peptide from a database • Generate a hypothetical spectrum • Compute a correlation between observed and experimental spectra • Choose the best • Database searching is very powerful and is the de facto standard for MS. • Sequest, Mascot, and many others
Spectra: the real story • Noise Peaks • Ions, not prefixes & suffixes • Mass to charge ratio, and not mass • Multiply charged ions • Isotope patterns, not single peaks
xn-i yn-i yn-i-1 vn-i wn-i zn-i -HN-CH-CO-NH-CH-CO-NH- CH-R’ Ri i+1 ai R” i+1 bi bi+1 ci di+1 low energy fragments high energy fragments Peptide fragmentation possibilities(ion types)
Ion types, and offsets • P = prefix residue mass • S = Suffix residue mass • b-ions = P+1 • y-ions = S+19 • a-ions = P-27
Mass-Charge ratio • The X-axis is (M+Z)/Z • Z=1 implies that peak is at M+1 • Z=2 implies that peak is at (M+2)/2 • M=1000, Z=2, peak position is at 501 • Suppose you see a peak at 501. Is the mass 500, or is it 1000?
Spectral Graph • Each prefix residue mass (PRM) corresponds to a node. • Two nodes are connected by an edge if the mass difference is a residue mass. • A path in the graph is a de novo interpretation of the spectrum 87 G 144
0 273 332 401 87 144 146 275 100 200 300 S G E K Spectral Graph • Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. • Spectral graph: • Each node u defines a putative prefix residue M(u). • (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. • Paths in the spectral graph correspond to a interpretation
0 273 332 401 87 144 146 275 100 200 300 S G E K Re-defining de novo interpretation • Find a subset of nodes in spectral graph s.t. • 0, M are included • Each peak contributes at most one node (interpretation)(*) • Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) • An appropriate objective function (ex: the number of peaks interpreted) is maximized 87 G 144
0 273 332 401 87 144 146 275 100 200 300 S G E K Two problems • Too many nodes. • Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) • Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). • In general, the forbidden pairs problem is NP-hard
However,.. • The b,y ions have a special non-interleaving property • Consider pairs (b1,y1), (b2,y2) • If (b1 < b2), then y1 > y2
100 0 400 200 Non-Intersecting Forbidden pairs 332 300 87 S • If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, • The de novo problem can be solved efficiently using a dynamic programming technique. G E K
The forbidden pairs method • There may be many paths that avoid forbidden pairs. • We choose a path that maximizes an objective function, • EX: the number of peaks interpreted
332 100 300 0 400 200 87 The forbidden pairs method • Sort the PRMs according to increasing mass values. • For each node u, f(u) represents the forbidden pair • Let m(u) denote the mass value of the PRM. f(u) u
D.P. for forbidden pairs • Consider all pairs u,v • m[u] <= M/2, m[v] >M/2 • Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M • Is it sufficient to compute S(u,v) for all u,v? 332 100 300 0 400 200 87 u v
D.P. for forbidden pairs • Note that the best interpretation is given by 332 100 300 0 400 200 87 u v
D.P. for forbidden pairs • Note that we have one of two cases. • Either u < f(v) (and f(u) > v) • Or, u > f(v) (and f(u) < v) • Case 1. • Extend u, do not touch f(v) 100 300 0 f(u) 400 200 u v
The complete algorithm for all u /*increasing mass values from 0 to M/2 */ for all v /*decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v)E /*maxI is the score of the best interpretation*/ maxI = max {maxI,S[u,v]}