280 likes | 959 Views
The peptide de novo sequencing from MS/MS spectrum. Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby, Chengzhi Liang, Ming Li. Introduction.
E N D
The peptide de novo sequencing from MS/MS spectrum Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby, Chengzhi Liang, Ming Li
Introduction • Tandem mass spectrometry (MS/MS) now plays a very important role in protein identification due to its fastness and its high sensitivity. • The derivation of the peptide sequence from its MS/MS spectrum is an important task in proteomics. • The derivation without the help from a protein database is called the de novo sequencing which is especially important in the identification of unknown protein.
Introduction (2) • The basic lab experimental steps of this method are the following: • 1. The proteins are digested with an enzyme to produce peptides; • 2. The peptides are charged (ionized) and separated according to their different mass to charge (m/z) ratios; • 3. Each peptide is fragmented into fragment ions and the m/z values of the fragment ions are measured.
Introduction (3) • Both step 2 and 3 are performed within a tandem mass spectrometer. • Since there are many copies of each peptide being fragmented and the fragmentation can occur anywhere along the peptide, a spectrum of the observed m/z values is obtained.
Mass spectrum • For each possible fragment ion there could be a peak at the corresponding m/z value. • The height of the peak is proportional to the frequency of the m/z value begin observed by the mass spectrometer. • In general proteins consist of 20 different types of amino acids, of which most have different masses (except for one pair Leucine and Isoleucine).
Mass spectrum (2) • Consequently different peptides usually produce different spectra. • It is therefore possible, and now a common practice, to use the spectrum of a peptide to determine its sequence.
Peptide fragmentation • A charged peptide may be fragmented into two pieces in three ways, which may produce a pair of a- and x-ions, a pair of b- and y-ions, or a pair of c- and z-ions. • Theoretically, a fragmentation can occur at any place in a peptide and a spectrum is expected to contain all the possible ion peaks. • In practice, due to uneven strength of the bonds at different positions, different ions occur with different frequencies.
Peptide fragmentation (3) • The most abundant ions are y-ions, which often form the complete series in a spectrum. • The next are a- and b-ions, of which many are not observed. • The c-, x-, and z-ions occur much less frequently. • In addition, these ions can often form new ions due to loss of water or loss of ammonia.
The approximate masses of some atoms that appear in peptides, where C13 is the isotope of C • Atom C C13 H O N • Mass(Dalton) 12 13 1 16 14
Mass of an amino acid • For any amino acid a, we use ||a|| to denote the mass of C2H2RNO, i.e., the amino acid a with loss of a water. • For P=a1 a2 … ak being a sequence of amino acids, let ||P|| = 1 j k ||aj||. • Therefore the actual mass of peptide P is 18+||P|| because the extra H2O in it.
The approximate masses of the 20 amino acids • Amino acid A R N D • Mass (Dalton) 71.04 156.10 114.04 115.03 • Amino acid C E Q G • Mass(Dalton) 103.01 129.04 128.06 57.02 • Amino acid H I L K • Mass (Dalton) 137.06 113.08 113.08 128.09 • Amino acid M F P S • Mass (Dalton) 131.04 147.07 97.05 87.03 • Amino acid T W Y V • Mass (Dalton) 101.05 186.08 163.06 99.07
The hypothetical spectrum of P • Let A=a1 a2 … an be a sequence of amino acids, we introduce two notations: ||A||b = 1+||A|| ||A||y =19+||A||
The hypothetical spectrum of P (2) • Let bi be the mass of the b-ion of P with i amino acids, then bi = ||a1 a2 …ai||b (1 i < k). • Let yi be the mass of the y-ion of P with i amino acids, then yi =||ak-i+1 …ak ||y (1 i < k). Clearly, yk-i +bi =20+||P||
The hypothetical spectrum of P (3) • Around each y-ion peak, it is possible to have other peaks. • For each y-ion with mass x, the corresponding x-ion and z-ion weigh x+26 and x-17. • An ion may loss a water to generate a peak at mass x-18. • An ion with mass x usually has a peak at x+1 corresponding to the isotopic ion which contains a C13 in it.
The hypothetical spectrum of P (4) • Therefore, for each y-ion with mass x, there are possible peaks at the masses in the following set. • Y(x)={x-18,x-17,x,x+1,x+26} • Similarly for each b-ion with mass x, the possible masses are from the following set. • B(x)={x-28,x-18,x,x+1,x+17}
The hypothetical spectrum of P (5) • Therefore, the hypothetical spectrum of the peptide P has peaks at each mass in the following set. • S(P)= 0<i< n B(bi) Y(yi)
The de novo sequencing problem • Let P be a peptide and M=||P||+20. • Given a solution containing peptide P, a tandem mass spectrometer can measure a peak list L. • L is a set of 2-mers {(xi ,hi )| 0 < i < n+1} where 0 < x1 < … < xn are the masses and hi is the intensity of the peak at xi . • The total mass of P=M-2 can also be measured.
The de novo sequencing problem (2) • The masses given by the spectrometer are not accurate. • The maximum error varies from 0.01 dalton to 0.5 dalton depending on the type of spectrometer used.
The de novo sequencing problem (3) • Let be the error of the spectrometer. • Let S be a set of masses, we say a peak (x,h) in L is supported by S if there is a y in S such that |x-y| < . • The subset of peaks in L supported by S is denoted by LS . • LS ={(x,h) L|there is y S s.t. |x-y|< }
The de novo sequencing problem (4) • Therefore LS(P) consists of all the peaks in L that are supported by the masses of the hypothetical ions of P • The more peaks with high intensity are in LS(P) , the more likely L is the mass spectrum of P.
The de novo sequencing problem (5) • For any peak list L’, we define h(L’)= (x,h) L’ h • The de novo sequencing problem is defined as the follows. • Given a mass spectrum L, a positive number M, and an error bound , to construct a peptide P so that | ||P||+20-M | < and h(LS(P) ) is maximized.
Algorithms • There are two major difficulties of the de novo sequencing problem. • First, each fragmentation may produce a pair of ions. • This means that both ends of the spectrum must be consider at the same time.
Algorithms (2) • Second, the types of the peaks is unknown and a peak may be matched by zero, one or two different types of ions. • When a peak is matched by two ions, the height of the peak can only be counted once
Algorithms (3) • The straightforward approach to “grow” the peptide from one terminal to the other does not work. • We use a more sophisticated dynamic programming algorithm for the de novo sequencing problem. • Our algorithm gradually “grow” a prefix and a suffix of the optimal solution in a carefully designated pathway until the prefix and the suffix are sufficiently long to form the optimal solution.
Experiments • Our model and algorithm account for most of the ion types that have been observed in practice. • Overlap of two different ions are correctly modeled. • Tolerant the mass error and handle the missing ions in the spectrum.
Experiments (2) • Experimental results demonstrated that our algorithm performed extremely well. • The program has been integrated into a software package, peaks, which is now online accessible at http://www.BioinformaticsSolutions.com