280 likes | 504 Views
PEAKS: De Novo Sequencing using MS/MS spectra. Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada. Outline. Background Tandem Mass Spectrometry De novo sequencing Problem Definition and Algorithm.
E N D
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada
Outline • Background • Tandem Mass Spectrometry • De novo sequencing • Problem Definition and Algorithm. • Software implementation – PEAKS • Future work
Background • Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions. • Diseases are closely related to the abnormal proteins or the expression levels of proteins. • Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.
Proteins and Peptides • A protein is a sequence of 20 different types of amino acids. • A protein is a string over alphabet with size 20 • A peptide is a substring of the protein. • The 20 amino acids have 19 distinct masses. • I and L have the same mass and cannot (difficult) be distinguished by MS/MS. • Regard them as the same letter.
tissue protein gel fraction …VITK | GTDIMNEMR | SMW… peptide Tandem Mass Spectrometry • MS/MS is the only reliable way for protein identification.
database de novo sequencing: LGSSEVEQVQLVVDGVK peptide sequence: LGSSEVEQVQLVVDGVK tandem mass spectrometer: MS/MS spectrum
How Does a Peptide Fragment? m(b1)=1+m(A1) m(b2)=1+m(A1)+m(A2) m(b3)=1+m(A1)+m(A2)+m(A3) m(y1)=19+m(A4) m(y2)=19+m(A4)+m(A3) m(y3)=19+m(A4)+m(A3)+m(A2)
De Novo Sequencing • For any peptide P= a1…an, m(P) = Σi ai. • De Novo Sequencing • Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.
19 Y-ions Determined By a Suffix y1 y2 y3 score(Q) can be defined for a suffix Q.
Strategies • Consider a pair of prefix R and a suffix Q simultaneously. • Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy” • Chummy pairs allow: • The score of a chummy pair can be computed recursively from a smaller chummy pair. • There are a series of chummy pairs that grow to the optimal solution.
Dynamic Programming • Combining Lemma A, B, we can compute • Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.
Comparison of PEAKS and Lutefisk Red = Correct
Implementation Particulars • More accurate scoring: • sum of the logarithmic intensities • many other ion types • coexisting ions, e.g., x2, y2, z2 • Deconvolution • converting multiply-charged peaks to singly-charged ones • Recalibration • compress/stretch the spectrum for calibration error • Noise reduction
Acknowledgement • Bin Ma, Kaizhong Zhang were supported by NSERC. • Chengzhi Liang was supported by BSI. • Thanks the development team in BSI for the software development.
Tandem Mass Spectrometer detector ions precursor ions fragment ions + mass analyzer P + + AK mass analyzer MPSER PAK + + + + + + fragment P AK AK PA K P + PAK PAK + + K + PAK PA SG… + + PAK PA K … de novo sequencing
Algorithm Sandwich • DP(0,0) = 0;DP(u,v) = -infinity for (u,v)!=(0,0); • for u from 1 to m/2 do for v from u-max(a) to u+max(a) do for a in Σ do if u<v then else • find u,v,a, s.t. u+v+a=m and DP(u,v) maximized; • backtracking;
Dynamic Programming • for u from 0 to m • backtracking
Dynamic Programming • We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. • The optimal solution can be obtained by concatenation of the prefix and suffix.
Chummy Pairs • Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true: (C1) (C2) (LGE, LVR) (C2) (LGE, VR) (C1) (LGE, R) (C1) (LG,VR) is not chummy
Chummy pairs • Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true, If (C2) is true,
Chummy Pairs • Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that