1 / 36

What is mass?

Eat Raw & Fresh: Introducing i sotopic M ass-to-charge Ratio and E nvelope F ingerprinting ( iMEF ) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012. What is mass?. Monoisotopic mass (m/z, z=+1 ). L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.

haracha
Download Presentation

What is mass?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012

  2. What is mass? Monoisotopic mass (m/z, z=+1) L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.

  3. Missing monoisotopic mass in protein Monoisotopic mass : most significant & accurate Mass of the most abundant isotope Error: ±1 Da or more (mis-assignment of # of contributing heavy isotopes ) Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability)

  4. Deisotoping (Deconvolution) Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE Steps: Calculate background noise level Determine charge state using FT/Patterson technique Calculate theoretical profile Fit with observed isotopic profile Monoisotopic mass Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind

  5. Peptide Mass Fingerprinting (PMF) Protein Database RAWFile Input MS Spectrum (iE) MS/MSSpectra (iE) A1/P1 A1/P2 A2/P3 Search Engine Parent (Theo. mass) Parent (Exp. mass) A2/P4 Fragments (Theo. mass) Fragments (Exp. mass) Candidates Output Final IDs Initial IDs

  6. Ubiquitin - MS spectrum (profile)

  7. Ubiquitin – MS/MS (ETD) Spectrum (Profile)

  8. Database search with PMF using ProSightPC NMFs = 92 NUMFs = 219 P score = 4.86E-98

  9. Definition of P_Score f - the total number of observed fragments (NMFs + NUMFs); n - the number of matching fragments (NMFs). x - the mean probability that a mass of an observed fragment ion will randomly match one from a generic protein 111.1 - the mass of the average amino acid, weighted for its occurrence in proteins; 2 - the number of fragment ions generated from each bond cleavage, which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window). Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952

  10. Is “MFs” really good? ?

  11. Is “NUMFs” really good? RAPID (28+49=77) THRASH (92+219=311) PeakPicking: SNRThreshold = 3.0 BackgroundRatio = 5.0 FitType = Lorentzian DeconvPep: MaxCharge = 25 ThScore = 0.0 AdvDeconv: MaxAbundancePeak = 3 ScanNoModifier = 0 MaxMissPeak = 3 MassErr = 1.0E-05 ThClustExt = 0.0 IntsRangeErr = 0.5 Better “deisotoping”? NO “deisotoping”?

  12. What is a mass spectrum? MS of Ubiquitin

  13. The nature of the iE of an ion x, y coordinates Profile Centroid

  14. What are in a protein database? MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG x, y coordinates C378H630N105O118S1 Centroid

  15. A1/P1 A1/P2 A2/P3 A/P1 Parent (Theo. iE) Parent (Theo. mass) Parent (Exp. mass) A2/P4 A/P2 Fragments (Theo. mass) Fragments (Theo. iE) Fragments (Exp. mass) iMEF(isotopic m/z & Envelope Fingerprinting) Protein Database RAWFile Input MS Spectrum (iE) MS/MSSpectra (iE) Search Candidates Output Final IDs Initial IDs

  16. N Y A1/F1 DB 3rd isotopic peak N N Y Top-down Screening – MS/MS2 ( Targeted Screening - MS2) A1/F1 Isotopic peak exclusion list DB 2nd isotopic peak N N A1/F1 1st isotopic peak DB iMEF = iMF (A1) + iEF (A2) N Y Preliminary protein candidates A2 F2 Parent ion theo. iE Parent ion exp. iE Y Protein candidates A2/F3 Fragment ion theo. iEs Fragment ion exp. iEs Y N Preliminary protein IDs NMFs PTM_Scores Norm. isotopic peaks removed Initial protein ID Initial protein IDs Remove duplicates Combined initial protein IDs Final IDs

  17. Pre-Step 1: Customized database MS Precursor ions MS/MS fragment ions

  18. Pre-Step 2: Noise level determination

  19. Ubiquitin - MS spectrum (profile)

  20. Ubiquitin – MS/MS (HCD) spectrum (profile)

  21. Step 1: Profile to centroid (MS & MS2)

  22. Step 2: iMF of precursor ion candidates 857.47461 (4 ppm) Top-down Screening IPMD  15ppm isolation window (±3 m/z units) … … … … … …

  23. Step 3: iEF of precursor ion candidates IPACO  5% IPMD  15ppm IPAD  30%

  24. Step 4: iMF of fragment ion candidates Targeted Screening IPMD  10 ppm 277.13278 (5 ppm) C1;MAX_MZ=149.07431&C2;MAX_MZ=277.132888&C3;MAX_MZ=390.216952&C4;MAX_MZ=537.285366&C5;MAX_MZ=636.353779&C6;MAX_MZ=764.448743&C7;…

  25. Step 5: iEF of fragment ion candidates IPACO  5% IPMD  10ppm IPAD  50%

  26. Exemplary PTM_Score assignment Human histone H4_S1acK16acK20me2

  27. ID of ubiquitin from ETD NMFs = 91 IPACO=10, IPMD=15, IPAD=100 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD

  28. Pros and Cons • Pros: • As-strict-as-you-choose confidence • Strict quality control (QC) • Fine discrimination of close iEs • In-situ unwrapping of overlapped iEs • Cons: • More complex and bigger database • More data points for fingerprinting

  29. Pros: As-strict-as-you-choose confidence Comparison with ProSightPC

  30. Layman’s choice of parameters Default values with statistical significance!

  31. Pros: Fine discrimination of close iEs

  32. Pros: In-situ unwrapping of overlapped iEs Proportional partition k:# of overlapped isotopic peaks m:# of isotopic peak in each iE n:# of overlapped iEs

  33. Other improvements and utilities • Improvements: • Bi-section method for fast indexing of candidates • LASSO-like approach to untangle overlapped iEs • Additional utilities: • A comprehensive confidence score • False discovery rate (FDR) • Customized ion types to look for new dissociation channels • Customized MODs for the search of new modification or labeled proteins • MS/MS spectrum annotation with matching fragments

  34. An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle • The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD • iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions • iMEF could be adapted for peptide and glycan database search with customized databases Conclusions

  35. Acknowledgements DNL2003 Li Li Bo Wang Jing Li Xu Zhao The KENES. Co. Ltd. Miao Zhou Shijin Liu Bin Yang Funding: DICP “Research Start” China “Youth 1000-talents Theme”

  36. Thank you very much!

More Related