360 likes | 559 Views
Eat Raw & Fresh: Introducing i sotopic M ass-to-charge Ratio and E nvelope F ingerprinting ( iMEF ) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012. What is mass?. Monoisotopic mass (m/z, z=+1 ). L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.
E N D
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012
What is mass? Monoisotopic mass (m/z, z=+1) L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.
Missing monoisotopic mass in protein Monoisotopic mass : most significant & accurate Mass of the most abundant isotope Error: ±1 Da or more (mis-assignment of # of contributing heavy isotopes ) Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability)
Deisotoping (Deconvolution) Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE Steps: Calculate background noise level Determine charge state using FT/Patterson technique Calculate theoretical profile Fit with observed isotopic profile Monoisotopic mass Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind
Peptide Mass Fingerprinting (PMF) Protein Database RAWFile Input MS Spectrum (iE) MS/MSSpectra (iE) A1/P1 A1/P2 A2/P3 Search Engine Parent (Theo. mass) Parent (Exp. mass) A2/P4 Fragments (Theo. mass) Fragments (Exp. mass) Candidates Output Final IDs Initial IDs
Database search with PMF using ProSightPC NMFs = 92 NUMFs = 219 P score = 4.86E-98
Definition of P_Score f - the total number of observed fragments (NMFs + NUMFs); n - the number of matching fragments (NMFs). x - the mean probability that a mass of an observed fragment ion will randomly match one from a generic protein 111.1 - the mass of the average amino acid, weighted for its occurrence in proteins; 2 - the number of fragment ions generated from each bond cleavage, which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window). Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952
Is “NUMFs” really good? RAPID (28+49=77) THRASH (92+219=311) PeakPicking: SNRThreshold = 3.0 BackgroundRatio = 5.0 FitType = Lorentzian DeconvPep: MaxCharge = 25 ThScore = 0.0 AdvDeconv: MaxAbundancePeak = 3 ScanNoModifier = 0 MaxMissPeak = 3 MassErr = 1.0E-05 ThClustExt = 0.0 IntsRangeErr = 0.5 Better “deisotoping”? NO “deisotoping”?
What is a mass spectrum? MS of Ubiquitin
The nature of the iE of an ion x, y coordinates Profile Centroid
What are in a protein database? MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG x, y coordinates C378H630N105O118S1 Centroid
A1/P1 A1/P2 A2/P3 A/P1 Parent (Theo. iE) Parent (Theo. mass) Parent (Exp. mass) A2/P4 A/P2 Fragments (Theo. mass) Fragments (Theo. iE) Fragments (Exp. mass) iMEF(isotopic m/z & Envelope Fingerprinting) Protein Database RAWFile Input MS Spectrum (iE) MS/MSSpectra (iE) Search Candidates Output Final IDs Initial IDs
N Y A1/F1 DB 3rd isotopic peak N N Y Top-down Screening – MS/MS2 ( Targeted Screening - MS2) A1/F1 Isotopic peak exclusion list DB 2nd isotopic peak N N A1/F1 1st isotopic peak DB iMEF = iMF (A1) + iEF (A2) N Y Preliminary protein candidates A2 F2 Parent ion theo. iE Parent ion exp. iE Y Protein candidates A2/F3 Fragment ion theo. iEs Fragment ion exp. iEs Y N Preliminary protein IDs NMFs PTM_Scores Norm. isotopic peaks removed Initial protein ID Initial protein IDs Remove duplicates Combined initial protein IDs Final IDs
Pre-Step 1: Customized database MS Precursor ions MS/MS fragment ions
Step 2: iMF of precursor ion candidates 857.47461 (4 ppm) Top-down Screening IPMD 15ppm isolation window (±3 m/z units) … … … … … …
Step 3: iEF of precursor ion candidates IPACO 5% IPMD 15ppm IPAD 30%
Step 4: iMF of fragment ion candidates Targeted Screening IPMD 10 ppm 277.13278 (5 ppm) C1;MAX_MZ=149.07431&C2;MAX_MZ=277.132888&C3;MAX_MZ=390.216952&C4;MAX_MZ=537.285366&C5;MAX_MZ=636.353779&C6;MAX_MZ=764.448743&C7;…
Step 5: iEF of fragment ion candidates IPACO 5% IPMD 10ppm IPAD 50%
Exemplary PTM_Score assignment Human histone H4_S1acK16acK20me2
ID of ubiquitin from ETD NMFs = 91 IPACO=10, IPMD=15, IPAD=100 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD
Pros and Cons • Pros: • As-strict-as-you-choose confidence • Strict quality control (QC) • Fine discrimination of close iEs • In-situ unwrapping of overlapped iEs • Cons: • More complex and bigger database • More data points for fingerprinting
Pros: As-strict-as-you-choose confidence Comparison with ProSightPC
Layman’s choice of parameters Default values with statistical significance!
Pros: In-situ unwrapping of overlapped iEs Proportional partition k:# of overlapped isotopic peaks m:# of isotopic peak in each iE n:# of overlapped iEs
Other improvements and utilities • Improvements: • Bi-section method for fast indexing of candidates • LASSO-like approach to untangle overlapped iEs • Additional utilities: • A comprehensive confidence score • False discovery rate (FDR) • Customized ion types to look for new dissociation channels • Customized MODs for the search of new modification or labeled proteins • MS/MS spectrum annotation with matching fragments
An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle • The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD • iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions • iMEF could be adapted for peptide and glycan database search with customized databases Conclusions
Acknowledgements DNL2003 Li Li Bo Wang Jing Li Xu Zhao The KENES. Co. Ltd. Miao Zhou Shijin Liu Bin Yang Funding: DICP “Research Start” China “Youth 1000-talents Theme”