With the advent of high throughput proteomics, data is being generated at an astonishing rate. It has become clear that validating peptide sequence assignments generated by database search engines is an increasingly important, but often overlooked aspect of protein identification using tandem mass spectrometry. In this tutorial you will learn about some of the factors important in low energy peptide fragmentation and how to use this information to accept or reject database search engine peptide sequence assignments. NEXT

J High quality spectrum with good signal-to-noise and all of the dominant ions assigned. Poor quality spectrum with low signal-to-noise. Fragment ions are most likely randomly assigned to noise peaks. L Why is validating data important? These two fragment ion spectra from a multi-dimensional LC proteomics experiment using a quadrupole ion trap were searched with Mascot. They both had similar scores. Are these two spectra equivalent? • Every database search, for a variety of reasons, generates false positive and false negative assignments. We would like to at least reduce, if not eliminate, these incorrect hits. • Decisions, often involving a great deal of money for bioassays, will be made downstream of our peptide identification. In this era of tight funding, it is crucial that the data upon which these crucial decisions rest are completely accurate. Mascot Score = 101 Mascot Score = 102 NEXT

How do peptides break apart? • In order to assess whether a sequence assignment is correct or not, it is important to understand how and why peptides break apart. • Under low energy dissociation conditions, peptides primarily fragment at the C – N bond. • If the charge is retained on the N-terminal end of the peptide, the ion is known as a b-type ion. • If the charge is retained on the C-terminal end, the ion is termed a y-type ion. • The fragmentation energy in some instruments, especially triple quadrupoles or quadrupole-time-of-flight hybrids (Q-TOFs) is often sufficient to generate cleavage at the C-C bond as well, causing loss of CO from the b ion. These ions are known as a-type ions. NEXT

b1 b2 b3 b4 b5 b6 H2N - A B C D E F G - COOH y4 y6 y5 y3 y2 y1 Peptides dissociate into nested sets of fragments If A, B, C, D, E, F and G represent different amino acids, this peptide can dissociate to form the following fragments: H2N-A+ = b1 H2N-AB+ = b2 H2N-ABC+ = b3 H2N-ABCD+ = b4 H2N-ABCDE+ = b5 H2N-ABCDEF+ = b6 H2N-ABCDEFG+ = b7 y1 = +G-COOH y2 = +FG-COOH y3 = +EFG-COOH y4 = +DEFG-COOH y5 = +CDEFG-COOH y6 = +BCDEFG-COOH y7 = +ABCDEFG-COOH NEXT

b1 b2 b3 b4 b5 b6 H2N - A B C D E F G - COOH y4 y6 y5 y3 y2 y1 FG E D C B A y2 y4 y5 b2 y3 Relative Abundance b3 y6 b4 b5 m/z b6 AB C D E F G Mass difference reflects peptide sequence The peptide amino acid sequence can be deduced by calculating the difference in mass between peaks. If the mass corresponds to an amino acid residue (a table is shown in the next slide) then that amino acid is assigned to the peak representing the difference. The largest y-type ion will appear anywhere between 57 to 186 amu below the mass of the precursor ion. The smallest y ion will appear at the amino acid residue mass plus 19 amu. The largest b ion will be at 18 amu plus the residue mass below the precursor and the smallest b ion will be at the residue mass + 1. Using this information, we can interpret fragmentation spectra to deduce the amino acid sequence. Keep clicking to view an ion series. NEXT

basic lose ammonia Not usually observed in ion traps due to low mass cutoff. Same with b1/y1. loseammonia acidic -H2S=34 lose ammonia acidic suppress b basic basic lose ammonia -CH3SH=48 suppress b lose water lose water The table of common amino acids provides molecular weights for the residues, structures of the side chains, and masses for the low mass immonium ions that result from side chain loss. These amino acids have chemical properties that need to be considered when validating sequence assignments. isobaric isobaric typically dominant abundant y NEXT

Other things to keep in mind • Basic amino acids can generate doubly-charged ions. • Ion signal can be intense for cleavages C-terminal to acidic amino acids. These residues also tend to lose water and cyclize to randomly eject portions of the sequence. • Isobaric amino acids cannot be differentiated using low energy fragmentation instruments. • Loss of water from threonine is particularly intense if the amino acid is near a terminal end of the peptide. • If a peptide is tryptic, y1 will either be lysine at 147 or arginine at 175. Some pairs of amino acids add up to the mass of a different amino acid. The same can happen with acetylated amino acids, a common modification. G-G = 114 = N G-A = 128 = K/Q V-G = 156 = R G-E = 186 = W A-D = 186 = W S-V = 186 = W AcG = 99 = V AcA = 113 = L/I AcS = 129 = E AcN = 156 = R NEXT

992.4 1219.4 1139.4 1091.4 1238.6 1276.5 1337.5 Lxx Val Val Phe K/Q Gly Arg We’re going to manually interpret an MS/MS spectrum generated by a quadrupole ion trap. The deconvoluted mass of the precursor is 1449.38 (the observed ion was doubly charged). We’ll start by looking at the dominant peaks that are below the mass of the precursor ion. We’ll look for possible y ions between 57 and 186 amu below 1450 (the (M+H)+ ion) and possible b ions in a window offset by another 18 amu. The first ion is at 1337.5. 1450-1337=113=Lxx, therefore we assign the largest y ion as either leucine or isoleucine. Next we look at the ion at 1238. Assuming it is a y ion, we note that 1337-1238=99=Val, so we assign the next y ion as Val. The next ion is at 1276. 1450-18-1276=156=Arg, therefore we assign the largest b ion as arginine. Since the sample was digested with trypsin, we would expect lysine or arginine as the first y ion. Continuing in this manner, we find b ions at 1276-1219=57=Gly and 1219-1091=128=Lys/Gln and y ions at 1238-1139=99=Val and 1139-992=147=Phe. NEXT

Lxx Val Val Phe K/Q Gly Arg To verify the high mass y ion assignments, we look for the complimentary low mass b ions. Since the data is from an ion trap, we will probably not see b1. Therefore, we start by looking for b2. Since the largest y ion was either leucine or isoleucine and the next y ion in the series was valine, we look for the complementary ion at 113+1+99=213=b2. 863.4 The next b ion will result from addition of another valine, therefore we’d expect a signal at 213+99=312=b3. 992.4 1219.4 360.2 When phenylalanine is added, we find the b4 ion at 312+147=459. 1139.4 1091.4 232.0 1238.6 311.8 944.3 927.3 975.4 443.3 1276.5 1337.5 459.0 213.0 NEXT

Lxx Val Val Phe K/Q Q Gly Arg To verify the high mass b ion assignments, we look for the complimentary low mass y ions. Since the data is from an ion trap, we will probably not see y1, however the assignment of arginine makes sense given that the sample was digested using trypsin. Therefore, we start by looking for y2. Assuming arginine is y1, we look for a signal resulting from the addition of glycine at 156+1+18+57=232=y2. We add the 19 amu to account for the carboxyl group. 863.4 The next complementary y ion should be at 232+128=360=y3 resulting from the addition of either lysine or glutamine. Since the sample was digested using trypsin, it is unlikely that the amino acid is lysine, since we would have expected a cleavage there. 992.4 1219.4 360.2 1139.4 1091.4 232.0 1238.6 311.8 944.3 927.3 975.4 443.3 1276.5 1337.5 459.0 213.0 NEXT

We look for the next largest b ion below 1091. There are three choices: 1091-975=116 1091-944=147=Phe 1091-927=164 Only the signal at 944 corresponds to an amino acid residue mass, therefore the next b ion is Phe. Lxx Val Val Phe Glu Asn Phe Gln Gly Arg Validating our high mass ion assignments, we expect to find b5 resulting from addition of glutamic acid at 459+129=588=b5and y4 resulting from addition of phenylalanine, 360+147=507=y4. The next dominant peak should correspond to a b ion so we look below 944 and note that 944-830=114=Asn. 507.2 The next largest y ion will appear below 992 Since 992-863=129=Glu, we assign the next y ion as glutamic acid. 863.4 992.4 1219.4 y3 830.2 1139.4 1091.4 y2 588.0 1238.6 944.3 489.0 b3 927.3 975.4 543.3 443.3 1276.5 1337.5 b4 b2 1091-975=116 1091-944=147=Phe NEXT 1091-927=164

Lxx Val Val Phe Glu Lxx Glu Asn Phe Gln Gly Arg 621.3 750.3 The next largest y ion will appear below 863. Since 863-750=113=Lxx, we assign the next y ion as leucine/isoleucine. y4 863.4 The next dominant peak should correspond to a b ion so we look below 830 and note that 830-701=129=Glu. 701.1 The next y ion will appear below 750. Since 750-621=129=Glu, we assign the next y ion as glutamic acid. However, since this is complementary to the b ion we just assigned, our sequence is complete. 992.4 1219.4 y3 830.2 1139.4 1091.4 y2 1238.6 b5 944.3 489.0 b3 927.3 975.4 543.3 443.3 1276.5 1337.5 b4 b2 1091-975=116 1091-944=147=Phe 1091-927=164 NEXT

b L V V F E L E N F Q G R y y5 In this example we have observed a complete series of complementary b and y ions. Purple bars indicate b ions while orange bars are for y ions. y6 y4 y7 b5 y8 b10 y3 b7 y9 b9 y2 b5 y10 489.0 b8 b3 543.3 443.3 b11 y11 b4 b2 1091-975=116 1091-944=147=Phe 1091-927=164 NEXT

Incorrect Identification # Rank/Sp Id# (M+H)+ deltCn XCorr Sp Ions Reference Peptide --- -------- -------- -------- ------ ------ ---- ---- --------- ------- 1. 1 / 1 0 1450.7694 0.0000 4.7567 2559.1 20/22 CRB1_HUMAN R.LVVFELENFQGR.R 2. 2 / 2 0 1451.7534 0.0254 4.6357 2541.5 20/22 CRB1_HUMAN R.LVVFELEN*FQGR.R 3. 3 / 2 0 1451.7534 0.0571 4.4851 2541.5 20/22 CRB1_HUMAN R.LVVFELENFQ*GR.R 4. 4 / 3 0 1452.7374 0.2804 3.4230 2036.0 18/22 CRB1_HUMAN R.LVVFELEN*FQ*GR.R 5. 5 / 6 0 1451.7569 0.4038 2.8358 1057.7 15/22 TP3B_HUMAN K.LN*M#VKFLQ*VEGR.G 6. 6 / 5 0 1450.7729 0.4619 2.5595 1426.3 17/22 TP3B_HUMAN K.LN*M#VKFLQVEGR.G 7. 7 / 13 0 1450.6549 0.4654 2.5430 840.0 13/22 NEUM_HUMAN R.TKQ*VEKN*DDDQ*K.I 8. 8 / 13 0 1449.6709 0.4752 2.4964 840.0 13/22 NEUM_HUMAN R.TKQ*VEKNDDDQ*K.I 9. 9 / 15 0 1449.6709 0.4757 2.4941 817.4 13/22 NEUM_HUMAN R.TKQ*VEKN*DDDQK.I 10. 10 / 11 0 1451.7055 0.5042 2.3586 843.6 13/22 ING_HUMAN K.S]VETIKEDM#NVK.F 11. 11 / 16 0 1451.8011 0.5145 2.3093 817.3 14/22 GGT5_HUMAN R.VNVYHHLVETLK.F 12. 12 / 4 0 1451.7494 0.5179 2.2932 1458.5 16/20 DESP_HUMAN R.LTYEIEDEKRR.R Sequest tenatively matches each spectrum to 12 peptides. Matches are rated with the Xcorr value (Sequest’s score criterion). Here is a poorly rated match. Although most of the dominant ions were assigned, they were not assigned to b and y ions but to water loss and other ions that are generally less abundant. NEXT

Correct Identification # Rank/Sp Id# (M+H)+ deltCn XCorr Sp Ions Reference Peptide --- -------- -------- -------- ------ ------ ---- ---- --------- ------- 1. 1 / 1 0 1450.7694 0.0000 4.7567 2559.1 20/22 CRB1_HUMAN R.LVVFELENFQGR.R 2. 2 / 2 0 1451.7534 0.0254 4.6357 2541.5 20/22 CRB1_HUMAN R.LVVFELEN*FQGR.R 3. 3 / 2 0 1451.7534 0.0571 4.4851 2541.5 20/22 CRB1_HUMAN R.LVVFELENFQ*GR.R 4. 4 / 3 0 1452.7374 0.2804 3.4230 2036.0 18/22 CRB1_HUMAN R.LVVFELEN*FQ*GR.R 5. 5 / 6 0 1451.7569 0.4038 2.8358 1057.7 15/22 TP3B_HUMAN K.LN*M#VKFLQ*VEGR.G 6. 6 / 5 0 1450.7729 0.4619 2.5595 1426.3 17/22 TP3B_HUMAN K.LN*M#VKFLQVEGR.G 7. 7 / 13 0 1450.6549 0.4654 2.5430 840.0 13/22 NEUM_HUMAN R.TKQ*VEKN*DDDQ*K.I 8. 8 / 13 0 1449.6709 0.4752 2.4964 840.0 13/22 NEUM_HUMAN R.TKQ*VEKNDDDQ*K.I 9. 9 / 15 0 1449.6709 0.4757 2.4941 817.4 13/22 NEUM_HUMAN R.TKQ*VEKN*DDDQK.I 10. 10 / 11 0 1451.7055 0.5042 2.3586 843.6 13/22 ING_HUMAN K.S]VETIKEDM#NVK.F 11. 11 / 16 0 1451.8011 0.5145 2.3093 817.3 14/22 GGT5_HUMAN R.VNVYHHLVETLK.F 12. 12 / 4 0 1451.7494 0.5179 2.2932 1458.5 16/20 DESP_HUMAN R.LTYEIEDEKRR.R When we compare the incorrect with the correct identification, we see that all of the dominant ions are assigned to b and y ions, with water losses accounting for many of the lower abundance peaks. The rank is 1 and the Xcorr is high in the Sequest output file for the correct ID. NEXT

Score vs. Spectral Quality As we have seen, a high score may be an indicator that an identification is correct. However, this does not hold true in all cases. In the next few examples, we will see instances where good scores actually corresponded to bad identifications, and bad scores corresponded to good identifications. The importance of spectral quality is demonstrated and the case of questionable identification is reviewed. Good Spectral Quality Bad ID Bad ID Good ID Good ID Bad Score Good Score ? ID Bad ID Bad ID Bad Spectral Quality NEXT

Example 1 Proteomics Data Mascot Search Good Spectral Quality Bad ID Bad Score Good Score NEXT Bad Spectral Quality

Great score, nice spectrum The mass spectrum has a lot of fragment ions and has good signal to noise. Oftentimes, good quality spectra like this provide good search results. 1.IPI00001661Mass: 45425 Total score: 111 Peptides matched: 1 Tax_Id=9606 Regulator of chromosome condensation Check to include this hit in error tolerant search or archive report The Mascot score is 111. Usually, scores over 40 or 50 typically generate correct identifications. The score is well beyond the 95% confidence level and is well separated from the other possibilities, generally a positive indicator of a correct ID. NEXT

The masses of the expected sequence ions are summarized in the data table. Masses labeled in red were observed in the experiment. As we can see, we have a fairly long run of contiguous sequence for the y ions and a shorter run for the b ions. There is some overlap of the b and y ions, however, so we have a complementary set. NEXT

Despite the many of the factors which seemed to lead to a correct identification, it is wrong. Since this is a good quality spectrum, it would be worth pursuing other interpretation options. For one, the peptide may be modified and it would be worth re-searching the data using a database including modifications such as phosphorylation or glycosylation, among others. Several searches may be required. De novo searching would also be a possible approach if modification searches do not provide acceptable results. Incorrect ID BUT… Can’t account for dominant ions!!! NEXT

Example 2 Good Spectral Quality Bad ID Bad Score Good Score NEXT Bad Spectral Quality

All dominant ions are unidentified, the spectrum is of good quality, with good signal to noise and well-separated fragment ions Mascot score is below generally accepted thresholds. No significant hits to report Unassigned queries: (no details means no match) Query Observed Mr(expt) Mr(calc) Delta Miss Score Rank Peptide 1868.07 1734.13 1730.98 3.15 2 29 1 KGVASTDNTLIARSLGK NEXT

There are only short runs of contiguous sequence, there is little complementarity between the b and y ions. With dominant ions unidentified the assignment is obviously incorrect. Since the spectrum is of good quality, the next step should be to consider modifications. NEXT

Example 3 Good Spectral Quality Good ID Bad Score Good Score NEXT Bad Spectral Quality

All dominant ions accounted for Largest peak corresponds to y5 – cleavage at proline BUT… Score = 31  below threshold No water loss for b ions until b9 when Thr appears. Water loss for y ions also starts after the threonine. Possibly the presence of the basic histidine and the acidic glutamic acid inhibit the water loss at y3 and y4. • Chemistry is plausible • A Sequest search provided the same identification as the Mascot search • Likely correct assignment NEXT

Example 4 Good Spectral Quality Bad Score Good Score Bad ID NEXT Bad Spectral Quality

Score = 18 Very weak spectrum There is no baseline, thus we must assume this is a noise spectrum. The ions are likely assigned by random chance. NEXT

Although there are some complementary runs of contiguous sequence ions, it is unlikely they are significant, given the quality of the mass spectrum. We do see a dominant fragment ion at the y17 proline, however other large signals do not correspond to expected cleavages C-terminal of the acidic amino acids. The assignment of doubly-charged ions without the presence of a basic ion is highly unlikely, therefore this identification is incorrect. This spectrum could likely be discarded. NEXT

Example 5 Good Spectral Quality Bad Score Good Score ? ID NEXT Bad Spectral Quality

Score = 38 Score slightly below threshold, spectral quality OK but fragmentation is limited, questionable ID The identification could possibly be disregarded if the protein assignment was confirmed by another peptide. Most abundant peak results from loss of water from serine, not proline cleavage as expected. The doubly-charged y8ion is reasonable, given the presence of the arginine. However, the b2 doubly-charged ion is unlikely. NEXT

Example 6 Good Spectral Quality Good ID Bad Score Good Score NEXT Bad Spectral Quality

Score = 79 Complete b and y series Plausible ion chemistry Correct ID! Abundant ions at D and W cleavages NEXT

The following examples compare spectra that were acquired on a Q-TOF and an ion trap. Both correct and incorrect identifications are shown. The different appearance of the fragmentation spectra, and the presence of immonium ions and low mass fragment ions in the Q-TOF, means that different considerations may need to be taken into account when different instruments are being used. NEXT

Correct Identification Note the presence of y1 and the immonium ion for F in the QTOF spectrum. The QTOF spectrum shows b ion suppression except for b2. QTOF Ion Trap NEXT

Incorrect Identification Both spectra have many dominant ions unaccounted for. The dominant ions are assigned as b type in the QTOF spectrum, which is unlikely. QTOF Ion Trap NEXT

Look at the next two slides. Can you guess which are the correct identifications and which are incorrect? NEXT

QTOF Ion Trap NEXT

When interpreting the results of search engines it is important to Look at the score Look at the sequence runs Consider the ion fragmentation chemistry What about the instrument? Does it all make sense? www.proteomesoftware.com Summary

NEXT

NEXT

Presentation Transcript

NEXT

NEXT

NEXT

NEXT

Next

NEXT

NEXT

NEXT

Next

NEXT

NEXT

Next

NEXT

NEXT

NEXT

NEXT

NEXT

NEXT

Next