350 likes | 654 Views
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry. Bin Ma Dept. of Computer Science University of Western Ontario. Outline. Background Sandwich algorithm for de novo sequencing Software implementation – PEAKS. Background. Diseases are closely related to the abnormal proteins.
E N D
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario
Outline • Background • Sandwich algorithm for de novo sequencing • Software implementation – PEAKS
Background • Diseases are closely related to the abnormal proteins. • Given a tissue, the identification of the proteins (and their posttranslational modifications) in it is a fundamental problem in proteomics. • MS/MS is the most common way for protein identification.
Sample Preparation tissue gel fraction GTDIMR HPLC PAK To MS/MS MPSER …… …… peptides Add trypsin
Tandem Mass Spectrometer QTOF detector ions parent ions fragment ions + Quadrupole mass analyzer P + + AK TOF mass analyzer MPSER PAK + + + + + + collision P AK AK PA K P + PAK PAK + + K + PAK PA SG… + + PAK PA K … peptide sequencing ESI
database de novo sequencing: LGSSEVEQVQLVVDGVK peptide sequence: LGSSEVEQVQLVVDGVK tandem mass spectrometry: MS/MS spectrum
How Does a Peptide Fragment? m(b1)=1+m(A1) m(b2)=1+m(A1)+m(A2) m(b3)=1+m(A1)+m(A2)+m(A3) m(y1)=19+m(A4) m(y2)=19+m(A4)+m(A3) m(y3)=19+m(A4)+m(A3)+m(A2)
De Novo Sequencing • De Novo Sequencing (Dancik et al., JCB 6:327-342.) • Given a spectrum, a mass value M, compute a sequence P, s.t. m(P)=M, and the matching score is maximized. • We consider the matching score of P is the sum of the scores of the matched peaks. • We use intensity of a peak as its score to illustrate PEAKS’ algorithm.
Spectrum Graph Approach • Convert the peak list to a graph. A peptide sequence corresponds to a path in the graph. • Bartels (1990), Biomed. Environ. Mass Spectrom 19:363-368. • Taylor and Johnson (1997). Rapid Comm. Mass Spec. 11:1067-1075. (Lutefisk) • Dancik et al. (1999), JCB 6:327-342. • Chen et al. (2001), JCB 8:325-337. • ……
19 The Score of a Suffix y1 y2 y3 Let Q be a suffix of the peptide. It can determine some y-ions. score(Q) are the sum of scores of those y-ions of Q.
19 Recursive Computation of DP(m) Q’ a Suppose Q is such that DP(m)=score(Q). score(Q’)=DP(m(Q’)) Do not know a?
Dynamic Programming • for m from 0 to M • backtracking
Good News y1 y2 y3 bn-3 bn-2 bn-1
Ions Determined By a Pair P=LGEY Q=LLVR score(P,Q) is the sum of matched peak intensities. A peak can only count once.
Chummy Pairs • Two strings P and Q are called chummy pairs, iff. either of the following two is true: (C1) (C2)
Recursive Computation of score(P,Q) P=LGEY Q=LLVR u=m(P), v=m(Q)
Chummy pairs • Lemma 1 – Suppose P and Q are a chummy pair. u=m(P), v=m(Q). If (C1) is true, If (C2) is true,
Chummy Pairs • Lemma 2 – Let (P,Q) be a chummy pair, a be a letter. • (C1) (P,aQ) is a chummy pair but (Pa,Q) is not. • (C2) (Pa,Q) is a chummy pair but (P,aQ) is not. • Lemma 3 – Let S be the optimal solution. Then there is a chummy pair (P,Q) and a letter a such that S=PaQ. Also, there is a chummy pair series such that
Dynamic Programming • Combining Lemma 1, 2, 3, we can compute • Suppose (P,Q) is the pair maximizing DP(u,v) under the condition m(P)+m(Q)+m(a)=M. Then PaQ is the optimal peptide.
Algorithm Sandwich • DP(0,0) = 0;DP(u,v) = -infinity for (u,v)!=(0,0); • for u from 1 to M/2 step d do for v from u-m(W) to u+m(W) step d do for a in Σ do if u<v then else • find u,v,a, s.t. u+v+m(a)=M and DP(u,v) maximized; • backtracking; Time:
Comparison • LCQ data (Iontrap instrument): • Generously provided by Dr. Richard Johnson. 144 spectra. • Micromass Q-Tof data: • Measured in UWO’s Protein ID lab. 61 spectra • Sciex Q-Star data: • Provided by U. Victoria’s Genome BC Proteomics Centre. 13 good/okay spectra.
PEAKS v.s. Lutefisk • completely correct sequences: • 38/144 v.s. 15/144 • correct amino acids: • 1067/1702 v.s. 767/1702 v.s. • partially correct sequences with 5 or more contiguous correct amino acids: • 94/144 v.s. 64/144
PEAKS v.s. Micromass PLGS • completely correct sequences: • 13/61 v.s. 7/61 • correct amino acids: • 456/764 v.s. 232/764 • partially correct sequences with 5 or more contiguous correct amino acids: • 38/61 v.s. 24/61
PEAKS v.s. Sciex BioAnalyst • completely correct sequences: • 7/13 v.s. 1/13 • correct amino acids: • 115/150 v.s. 86/150 • partially correct sequences with 5 or more contiguous correct amino acids: • 12/61 v.s. 7/61
Users The company logos have been deleted from the original presentation. Please visit http://www.bioinformaticssolutions.com for a list of users.
Other Techniques Used by PEAKS • Preprocess the MS/MS spectra • Deconvolution, noise reduction, and signal enhancement. • It does a better job than spectrometer vendor’s software. • Recalibration • compress/stretch the spectrum for calibration error • Positional Confidence • Estimate the confidence level of individual amino acids.
Sophisticated Ion Matching Score • Score of one peak matching b ion
PEAKS 2.x’s Additional Feature • Identify the proteins by matching the de novo (partial) sequences. • Then further match the spectra with the peptides of the proteins.
Collaborators and References • Sandwich algorithm: • B. Ma, K. Zhang, C. Liang, CPM’03. (sandwich algorithm) • PEAKS: • B. Ma, K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, G. Lajoie, Rapid Comm. Mass Spec. (software feature, score function, experiments) • Acknowledgement: • PEAKS development team. (Bioinformatics Solutions Inc.).