500 likes | 514 Views
This course covers protein sequencing methods like mass spectrometry, ionization, fragmentation, and database searching. Learn about identifying peptide fragments and interpreting spectra for peptide reconstruction. Explore isotopic peaks and isotopic distributions. Gain insights into handling isotopes in peptide analysis and exploring the isotope profile to determine the charge on a peptide.
E N D
CSE182-L7 Protein sequencing and Mass Spectrometry CSE182
Announcements • Midterm 1: Nov 1, in class. • Assignment 2: Online, due October 20. CSE182
Trivia Quiz • What research won the Nobel prize in Chemistry in 2004? • In 2002? CSE182
Nobel Citation 2002 CSE182
Nobel Citation, 2002 CSE182
Mass Spectrometry CSE182
Enzymatic Digestion (Trypsin) + Fractionation Sample Preparation CSE182
Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182
Tandem MS Secondary Fragmentation Ionized parent peptide CSE182
The peptide backbone The peptide backbone breaks to form fragments with characteristic masses. H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei CSE182
Ionization The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei Ionized parent peptide CSE182
Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CONH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei AA residuei+1 Ionized peptide fragment CSE182
Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H]2+ 0 250 500 750 1000 m/z CSE182
Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 Peak assignment implies Sequence (Residue tag) Reconstruction! y7 % Intensity [M+2H]2+ y5 b3 b4 y2 y3 b5 y4 y8 b8 b9 b6 b7 y9 0 250 500 750 1000 m/z CSE182
Database Searching for peptide ID • For every peptide from a database • Generate a hypothetical spectrum • Compute a correlation between observed and experimental spectra • Choose the best • Database searching is very powerful and is the de facto standard for MS. • Sequest, Mascot, and many others CSE182
Spectra: the real story • Noise Peaks • Ions, not prefixes & suffixes • Mass to charge ratio, and not mass • Multiply charged ions • Isotope patterns, not single peaks CSE182
xn-i yn-i yn-i-1 vn-i wn-i zn-i -HN-CH-CO-NH-CH-CO-NH- CH-R’ Ri i+1 ai R” i+1 bi bi+1 ci di+1 low energy fragments high energy fragments Peptide fragmentation possibilities(ion types) CSE182
Ion types, and offsets • P = prefix residue mass • S = Suffix residue mass • b-ions = P+1 • y-ions = S+19 • a-ions = P-27 CSE182
Mass-Charge ratio • The X-axis is (M+Z)/Z • Z=1 implies that peak is at M+1 • Z=2 implies that peak is at (M+2)/2 • M=1000, Z=2, peak position is at 501 • Suppose you see a peak at 501. Is the mass 500, or is it 1000? CSE182
Isotopic peaks • Ex: Consider peptide SAM • Mass = 308.12802 • You should see: • Instead, you see 308.13 308.13 310.13 CSE182
Isotopes • C-12 is the most common. Suppose C-13 occurs with probability 1% • EX: SAM • Composition: C11 H22 N3 O5 S1 • What is the probability that you will see a single C-13? • Note that C,S,O,N all have isotopes. Can you compute the isotopic distribution? CSE182
All atoms have isotopes • Isotopes of atoms • O16,18, C-12,13, S32,34…. • Each isotope has a frequency of occurrence • If a molecule (peptide) has a single copy of C-13, that will shift its peak by 1 Da • With multiple copies of a peptide, we have a distribution of intensities over a range of masses (Isotopic profile). • How can you compute the isotopic profile of a peak? CSE182
Nc=50 +1 Isotope Calculation • Denote: • Nc : number of carbon atoms in the peptide • Pc : probability of occurrence of C-13 (~1%) • Then Nc=200 +1 CSE182
Isotope Calculation Example • Suppose we consider Nitrogen, and Carbon • NN: number of Nitrogen atoms • PN: probability of occurrence of N-15 • Pr(peak at M) • Pr(peak at M+1)? • Pr(peak at M+2)? How do we generalize? How can we handle Oxygen (O-16,18)? CSE182
General isotope computation • Definition: • Let pi,a be the abundance of the isotope with mass i Da above the least mass • Ex: P0,C : abundance of C-12, P2,O: O-18 etc. • Characteristic polynomial • Prob{M+i}: coefficient of xi in (x) (a binomial convolution) CSE182
Isotopic Profile Application • In DxMS, hydrogen atoms are exchanged with deuterium • The rate of exchange indicates how buried the peptide is (in folded state) • Consider the observed characteristic polynomial of the isotope profile t1, t2, at various time points. Then • The estimates of p1,H can be obtained by a deconvolution • Such estimates at various time points should give the rate of incorporation of Deuterium, and therefore, the accessibility. CSE182
Quiz • How can you determine the charge on a peptide? • Difference between the first and second isotope peak is 1/Z • Proposal: • Given a mass, predict a composition, and the isotopic profile • Do a ‘goodness of fit’ test to isolate the peaks corresponding to the isotope • Compute the difference CSE182
Tandem MS summary • The basics of peptide ID using tandem MS is simple. • Correlate experimental with theoretical spectra • In practice, there might be many confounding problems. • A toolkit that resolves some of these problems will be useful. CSE182
MS Quiz: • Why aren’t all tandem MS peaks of the same intensity? • Do the intensities for a peptide vary from spectrum to spectrum? CSE182
De novo interpretation of mass spectra • The so called de novo algorithms focus exclusively on the D module. • There is no database (I/F). • Limited scoring and validation CSE182
Computing possible prefixes • We know the parent mass M=401. • Consider a mass value 88 • Assume that it is a b-ion, or a y-ion • If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 = 87. • If y-ion, y=M-P+19. • Therefore the prefix has mass • P=M-y+19= 401-88+19=332 • Compute all possible Prefix Residue Masses (PRM) for all ions. CSE182
Putative Prefix Masses Prefix Mass M=401 b y 88 87 332 145 144 275 147 146 273 276 275 144 • Only a subset of the prefix masses are correct. • The correct mass values form a ladder of amino-acid residues S G E K 0 87 144 273 401 CSE182
Spectral Graph • Each prefix residue mass (PRM) corresponds to a node. • Two nodes are connected by an edge if the mass difference is a residue mass. • A path in the graph is a de novo interpretation of the spectrum 87 G 144 CSE182
0 273 332 401 87 144 146 275 100 200 300 S G E K Spectral Graph • Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. • Spectral graph: • Each node u defines a putative prefix residue M(u). • (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. • Paths in the spectral graph correspond to a interpretation CSE182
0 273 332 401 87 144 146 275 100 200 300 S G E K Re-defining de novo interpretation • Find a subset of nodes in spectral graph s.t. • 0, M are included • Each peak contributes at most one node (interpretation)(*) • Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) • An appropriate objective function (ex: the number of peaks interpreted) is maximized 87 G 144 CSE182
0 273 332 401 87 144 146 275 100 200 300 S G E K Two problems • Too many nodes. • Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) • Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). • In general, the forbidden pairs problem is NP-hard CSE182
However,.. • The b,y ions have a special non-interleaving property • Consider pairs (b1,y1), (b2,y2) • If (b1 < b2), then y1 > y2 CSE182
100 0 400 200 Non-Intersecting Forbidden pairs 332 300 87 S • If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, • The de novo problem can be solved efficiently using a dynamic programming technique. G E K CSE182
The forbidden pairs method • There may be many paths that avoid forbidden pairs. • We choose a path that maximizes an objective function, • EX: the number of peaks interpreted CSE182
332 100 300 0 400 200 87 The forbidden pairs method • Sort the PRMs according to increasing mass values. • For each node u, f(u) represents the forbidden pair • Let m(u) denote the mass value of the PRM. f(u) u CSE182
D.P. for forbidden pairs • Consider all pairs u,v • m[u] <= M/2, m[v] >M/2 • Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M • Is it sufficient to compute S(u,v) for all u,v? 332 100 300 0 400 200 87 u v CSE182
D.P. for forbidden pairs • Note that the best interpretation is given by 332 100 300 0 400 200 87 u v CSE182
D.P. for forbidden pairs • Note that we have one of two cases. • Either u < f(v) (and f(u) > v) • Or, u > f(v) (and f(u) < v) • Case 1. • Extend u, do not touch f(v) 100 300 0 f(u) 400 200 u v CSE182
The complete algorithm for all u /*increasing mass values from 0 to M/2 */ for all v /*decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v)E /*maxI is the score of the best interpretation*/ maxI = max {maxI,S[u,v]} CSE182
De Novo: Second issue • Given only b,y ions, a forbidden pairs path will solve the problem. • However, recall that there are MANY other ion types. • Typical length of peptide: 15 • Typical # peaks? 50-150? • #b/y ions? • Most ions are “Other” • a ions, neutral losses, isotopic peaks…. CSE182
De novo: Weighting nodes in Spectrum Graph • Factors determining if the ion is b or y • Intensity • Support ions • Isotopic peaks (InsPecT’) CSE182
De novo: Weighting nodes • A probabilistic network to model support ions (Pepnovo) CSE182
De Novo Interpretation Summary • The main challenge is to separate b/y ions from everything else (weighting nodes), and separating the prefix ions from the suffix ions (Forbidden Pairs). • As always, the abstract idea must be supplemented with many details. • Noise peaks, incomplete fragmentation • In reality, a PRM is first scored on its likelihood of being correct, and the forbidden pair method is applied subsequently. CSE182