270 likes | 407 Views
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame. Problem. Input: An MS/MS spectrum of a mixture of peptides: Heavily modified protein Same amino acid sequence
E N D
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame
Problem • Input: An MS/MS spectrum of a mixture of peptides: • Heavily modified protein • Same amino acid sequence • Same PTM • Same total number of PTMs • Different PTM configurations • Example • Two peptides with two methylations each. LATK[+32]AARKSAE LATK[+16]AARK[+16]SAE • Problem: • Identify the PTM configurations • Estimate their relative abundance
Peptide identification • Input • A deisotoped MS/MS spectrum of a mixture of peptides • An identified peptide, the type of PTMs and the number of PTMs. • Example • Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE • PTM: Methylation • #PTM: 4 • Problem • Identify the PTM configurations • Estimate their relative abundance
All possible configuration • Assumption: • All methylations are on lysine residues • Each lysine residue has at most 3 methyl groups.
Configuration identification • Score of Spectrum-Configuration-Pair • Spectrum S: ETD peak list • Configuration C: theoretical peak list (c-ion) • Sc(S,C) is the number of matched peaks in the real peak list and the theoretical peak list. • Greedy algorithm • Compute the matching score for each configuration • Remove the configure with the highest score from the configuration set and remove the peaks in S that are matched to the configuration • Repeat the above steps until all configurations have score 0
Estimation of relative abundance • We have four identified configurations C1,C2,C3,C4. • x1, x2, x3, x4 the relative abundance • Sum equals to 1 • Consider the ith c-ion with charge z • Five possible peaks p0, …, p4 • Suppose p2 is matched to C1, C2 • Observed peak intensity I(p2) • Theoretical peak intensity • Compute the observed and theoretical peak intensity pair for each matched c-ion
Estimation of relative abundance • Find x1, x2, x3, x4 such that the sum of the squared errors of these intensity pairs is minimized. • Standard non-negative least-square procedure
A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass SpectrometryRichard C. Baliban, Peter A. DiMaggio, Mariana D. Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia and Christodoulos A. Floudas
Bottom up PTM identification • Two approaches • Tags • Non-tags • Restricted • Unrestricted • PILOT_PTM
Preprocessing • Remove all peaks related the precursor ion • Only keep locally significant peaks • Deisotope • Remove neutral offset if the peak doe not have a complementary peak. • Each candidate peak has a list of supporting peaks.
ILP Model • Input • A preprocessed deisotoped spectrum S={ a1,a2,…,am } • A peptide (theoretical b-ion peak list) P={ b1b2…bn} • A list of all known PTMs • Theoretical peak bk • CSk is the set of all possible peaks (indices) in S that bk can be matched to with PTMs • Real peak aj • Posj is the set of all possible peaks (indices) in P that aj can be matched to with PTMs • Supportj is the set of all peaks (indices) supporting peak j in S • Multj is the set of all peaks (indices) peak j supports
ILP Model • Binary variable • pj,k = 1 if peak aj in S is matched to bk in P, otherwise pj,k = 0 • yj = 1 is peak aj is a supporting peak or matched peak, otherwise yj = 0
ILP Model • Objective • Subject to • One peak in P can only match one peak in S • One peak in S can only match one peak in P
ILP Model Subject to: • No three consecutive missing peaks • The intensity of peak i is counted iff the exists one peak j such that peak i supports j and peak j is a matched peak.
ILP Model • Solve using CPLEX • Report top-10 variable assignments • Existing problem • No constraints that require the distance between two neighboring matched peaks should match the mass of a residue (with PTM)
New constraints • For each pj,k • Set of candidate ion peaks j’ with respect to k’ such that no valid jump exists between j and j’ • The maximum and minimum masses that can be reached from j, respectively
New constraints • Neighboring matched peaks do not conflict • Conflicting matched peaks must have a matched peak between them • The distance between two matched peaks should be bounded
Postprocessing • Re-scoring 10 candidate modified candidate peptides • Cross-correlation score • Recheck modifications if there are unmatched peaks indicating non-modification
Test data sets • Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of chemically synthesized phosphopeptides, manually validated • Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail, manually validated • Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments, manually validated • Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from chromatin fraction. Identified by SEQUEST and validated by MASCOT and remove low quality spectra manually • Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID unmodified spectra. Validated as test set D
Comparison on test sets C and D1Peptide and residue prediction accuracy
Comparison on test sets C and D1Subsequence prediction accuracy