1 / 27

Problem

Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame. Problem. Input: An MS/MS spectrum of a mixture of peptides: Heavily modified protein Same amino acid sequence

senona
Download Presentation

Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame

  2. Problem • Input: An MS/MS spectrum of a mixture of peptides: • Heavily modified protein • Same amino acid sequence • Same PTM • Same total number of PTMs • Different PTM configurations • Example • Two peptides with two methylations each. LATK[+32]AARKSAE LATK[+16]AARK[+16]SAE • Problem: • Identify the PTM configurations • Estimate their relative abundance

  3. Work flow

  4. Peptide identification • Input • A deisotoped MS/MS spectrum of a mixture of peptides • An identified peptide, the type of PTMs and the number of PTMs. • Example • Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE • PTM: Methylation • #PTM: 4 • Problem • Identify the PTM configurations • Estimate their relative abundance

  5. All possible configuration • Assumption: • All methylations are on lysine residues • Each lysine residue has at most 3 methyl groups.

  6. Configuration identification • Score of Spectrum-Configuration-Pair • Spectrum S: ETD peak list • Configuration C: theoretical peak list (c-ion) • Sc(S,C) is the number of matched peaks in the real peak list and the theoretical peak list. • Greedy algorithm • Compute the matching score for each configuration • Remove the configure with the highest score from the configuration set and remove the peaks in S that are matched to the configuration • Repeat the above steps until all configurations have score 0

  7. Configuration identification results

  8. Estimation of relative abundance • We have four identified configurations C1,C2,C3,C4. • x1, x2, x3, x4 the relative abundance • Sum equals to 1 • Consider the ith c-ion with charge z • Five possible peaks p0, …, p4 • Suppose p2 is matched to C1, C2 • Observed peak intensity I(p2) • Theoretical peak intensity • Compute the observed and theoretical peak intensity pair for each matched c-ion

  9. Estimation of relative abundance • Find x1, x2, x3, x4 such that the sum of the squared errors of these intensity pairs is minimized. • Standard non-negative least-square procedure

  10. A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass SpectrometryRichard C. Baliban, Peter A. DiMaggio, Mariana D. Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia and Christodoulos A. Floudas

  11. Bottom up PTM identification • Two approaches • Tags • Non-tags • Restricted • Unrestricted • PILOT_PTM

  12. Preprocessing • Remove all peaks related the precursor ion • Only keep locally significant peaks • Deisotope • Remove neutral offset if the peak doe not have a complementary peak. • Each candidate peak has a list of supporting peaks.

  13. ILP Model • Input • A preprocessed deisotoped spectrum S={ a1,a2,…,am } • A peptide (theoretical b-ion peak list) P={ b1b2…bn} • A list of all known PTMs • Theoretical peak bk • CSk is the set of all possible peaks (indices) in S that bk can be matched to with PTMs • Real peak aj • Posj is the set of all possible peaks (indices) in P that aj can be matched to with PTMs • Supportj is the set of all peaks (indices) supporting peak j in S • Multj is the set of all peaks (indices) peak j supports

  14. ILP Model • Binary variable • pj,k = 1 if peak aj in S is matched to bk in P, otherwise pj,k = 0 • yj = 1 is peak aj is a supporting peak or matched peak, otherwise yj = 0

  15. ILP Model • Objective • Subject to • One peak in P can only match one peak in S • One peak in S can only match one peak in P

  16. ILP Model Subject to: • No three consecutive missing peaks • The intensity of peak i is counted iff the exists one peak j such that peak i supports j and peak j is a matched peak.

  17. ILP Model • Solve using CPLEX • Report top-10 variable assignments • Existing problem • No constraints that require the distance between two neighboring matched peaks should match the mass of a residue (with PTM)

  18. New constraints • For each pj,k • Set of candidate ion peaks j’ with respect to k’ such that no valid jump exists between j and j’ • The maximum and minimum masses that can be reached from j, respectively

  19. New constraints • Neighboring matched peaks do not conflict • Conflicting matched peaks must have a matched peak between them • The distance between two matched peaks should be bounded

  20. Postprocessing • Re-scoring 10 candidate modified candidate peptides • Cross-correlation score • Recheck modifications if there are unmatched peaks indicating non-modification

  21. Test data sets • Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of chemically synthesized phosphopeptides, manually validated • Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail, manually validated • Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments, manually validated • Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from chromatin fraction. Identified by SEQUEST and validated by MASCOT and remove low quality spectra manually • Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID unmodified spectra. Validated as test set D

  22. Residue predication accuracy

  23. Peptide prediction accuracy

  24. Comparison on test sets C and D1Peptide and residue prediction accuracy

  25. Comparison on test sets C and D1Subsequence prediction accuracy

  26. Running time

  27. Q & A

More Related