180 likes | 342 Views
Modification Site Localization. Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity. PTM Analysis: An Exploding Field. Large-scale PTM characterization studies are now common Phosphorylation O- GlcNAcylation Acetylation …
E N D
Modification Site Localization • Why is this a problem? • Calculating localization reliability • Ways of representing reliability • Modification ambiguity
PTM Analysis: An Exploding Field • Large-scale PTM characterization studies are now common • Phosphorylation • O-GlcNAcylation • Acetylation • … • Database search engines can identify modified peptides and report a measure of reliability for peptide IDs • Peptide Level: p-value; e-value • Dataset Level: FDR • Most search engines do not assess modification site assignment reliability. • No standard FLR calculation method
Search Engine Performance for Site Assignment • Database search engines are optimized for peptide identification • Optimal parameters for discriminating between correct and random answers are not same as for site identification • More peaks may be needed for site assignment • Reliability of modified peptide identifications is higher than PTM site assignments • What most search engines do: • Report site consistent with data • May be more than one site equally consistent with the data • No information about how reliable site assignment is Bradshaw et al. J Mass Spectrom (2010) 45 10 1095-1097
There are Mistakes In The Literature • There are several large-scale PTM datasets where site assignment was ‘by manual verification’. • Did authors carefully look at 1000+ spectra? • Results from publications are used to populate other databases Phosphosite SwissProt
Evidence for Serine 486 Phosphorylation • Spectrum from publication reporting unambiguous assignment of serine 4 (serine 487) phosphorylation. Annotated spectra associated with publications are useful!
Why I highlighted this example • I found this modification site in my own data in 2006 SwissProt Entry of this protein in 2006
Site Assignment Scoring Methods (1) • Probability of randomly observing a given peak • A-Score (Gygi) • PTM Score (Mann) • Probability calculation based on unit mass measurement and assuming all masses equally possible at random: • e.g. if considering 4 peaks per 100 Da, then probability of random match of a given peak is 4% • A-score is a number; PTM score reports a probability • How valid are these assumptions? • Nominal mass may be appropriate for poor mass accuracy ion trap data, but not for high mass accuracy data • Could adjust probability calculation to more mass ‘bins’ • All masses are not equally probable; e.g. for b ions: • 201 – EA, LP, IP, TV 204 – Not possible • 202 – NS 205 – FG, CT • 203 – MA, CV, TT 206 – Not possible
Site Assignment Scoring Methods (2) • Score/probability difference • Compare search engine probabilities for peptide IDs with different site assignments • Mascot Delta Score • SLIP Score • e.g. Top scoring assignment: E-value: 1E-5 • Next best site assignment: E-value 1E-4; SLIP score=10 • Next best site assignment: E-value 1E-3; SLIP score=20 • Advantages: • Can be calculated as part of database search • Accounts for variation of probability of observing different masses • If search engine makes use of mass accuracy, score will adjust to data of different mass accuracy
Assessing Reliability of Site Localization Scoring • Data from 180 synthetic phosphopeptides • Tested with wide range of fragmentation data (CID, HCD, ETD, MSA…) • Comparison of Mascot Delta Score to A-score • SLIP Score in Protein Prospector • PhosphoRS used different set of synthetic phosphopeptides Savitskiet al. Mol Cell Proteomics (2011) M110.003830
SLIP Score vs A-Score vs MD-Score • Dataset: QTOF Micro CID Data of 180 synthetic phosphopeptides1 • Modification sites known • Data Searched by Mascot: 2174 correct spectra matches • Data Searched by PP: 2334 correct spectra matches Baker et al. Mol Cell Proteomics (2011) M111.008078
Decoy Sites for Estimating PEP (Local FLR) • Test Dataset: Synaptic phosphopeptides acquired in LTQ-OrbitrapVelos (IT-CID): 70,000 phosphopeptide spectra identified • Altered Batch-Tag to allow for phosphorylation of Pro and Glu • Filtered results to only phosphopeptide IDs containing one S, T or Y • Modification site known SLIP Score • Local FLR: SLIP score of 6 = 95% correct • Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data. • Similar score threshold appropriate for ion trap CID and quadrupole CID data
Representing Ambiguity VATVSVLATR – Singly phosphorylated Phospho@5=3 Best site assignment with associated score. No information as to which is second best site. Example software: A-Score; Mascot Delta Score; SLIP Score Phospho@3|5 Indicating inability to differentiate between two sites, either due to no information, or confidence below a defined threshold Example software: SLIP Score; VML Score VAT(0.1)VS(0.89)VLAT(0.01)R Probabilities for all potential site assignments within peptide are reported Example software: PTM Score / MaxQuant; PhosphoRS
Representing Ambiguity VATVSVLATR – Doubly phosphorylated Phospho@3=12; Phospho@5=3 Best site assignments with associated scores. Separate score calculated for each site assignment. Score is in comparison to best assignment not containing a particular modification site; i.e. @3 is relative to when residues 5 and 9 are modified. Phospho@3=12; Phospho@5|9 One site has confidence measure; other site does not. VAT(0.95)VS(0.9)VLAT(0.15)R Probabilities are combination probabilities for one of the two modifications.
Site-Level or Peptide-Level Assesment for Localization Reliability All current software reports reliability for individual site localizations, but software could in theory calculate a reliability for the combination of modifications reported: e.g. VAT(0.95)VS(0.9)VLAT(0.15)R Could be reported as VAT(phospho)VS(phospho)VLATR with probability (0.95x0.9=) 0.86
Modification Ambiguity • Some modifications are isobaric • Acetyl vsTrimethyl; PhosphovsSulfo; Ser->Thrvs Methyl • Some combinations of modifications are isobaric /isomeric with a single modification • Methyl + Methyl vsDimethyl • Carbamidomethyl + CarbamidomethylvsGlyGly (ubiquitin) • Carbamidomethyl + methyl vspropionamide (acrylamide) • Acetyl + K+/Ca2+ adduct vsphospho
Modification Ambiguity • Many of the published site localization software were specifically written for phospho, so will not work for other PTMs. • Site localization scoring based on search engine results should work for all modifications • SLIP score; Mascot Delta score; VML score • However, they will only be meaningful if the competing modification alternatives were considered in the initial database search • If carbamidomethyl modification of lysines or N-termini in addition to cysteines was not considered, then two carbamidomethyl modifications may not be considered as an alternative to ubiquitination. • Knowledge of modifications considered relevant to evaluating site localization reliability
PTMs in Crosslinked Peptides For crosslinked peptides, ambiguity may be between peptides: CAMKER TMAKER Oxidation could be on methionine in either peptide.
What is an Acceptable FLR? • 2012 iPRG study involved identification of modified peptides • Participants were asked to return results with 1% FDR at PSM level • They were asked to indicate for which peptides they thought PTM site assignments were reliable • Modified peptides were spiked in, so correct site localizations were known • What was reliability of results reported?