1 / 49

Building and Using Reference Libraries of Peptide Mass Spectra

Building and Using Reference Libraries of Peptide Mass Spectra. Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein. Mass Spectrometry Data Center. Old Chemical Sources. Source: www.phoenixmarcom.co.uk. Traditional EI Library Search. New Chemical Sources.

efrance
Download Presentation

Building and Using Reference Libraries of Peptide Mass Spectra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building and Using Reference Libraries of Peptide Mass Spectra Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein Mass Spectrometry Data Center

  2. Old Chemical Sources Source: www.phoenixmarcom.co.uk

  3. Traditional EI Library Search

  4. New Chemical Sources Source: www.uic.edu/.../bios100/ summer2002/lect05.htm

  5. Peptide MS/MS Search

  6. Reference Library Concerns • Reproducibility • MS/MS - Bad Reputation • Coverage • False Negatives

  7. Measuring Reproducibility • Widely Used Measure of Spectral Similarity • Dot Product of Replicate Spectra • Expressed normalized vectors • a.k.a, INCOS Match Factor, Cosine of Angle, Contrast Angle, ‘Correlation’, … • Varies with Signal To Noise • Maximum/Median Abundance • Robust measure for raw spectra

  8. High S/N Spectra are Reproducible Quad IT Linear IT AEFVEVTK/+1

  9. Low S/N Spectra are not so Reproducible S/N ~ 10 S/N ~ 5

  10. Low S/N = Variable Abundance + Noise S/N ~ 10 S/N > 100

  11. Variability Depends on S/N ~9,000 Radiodurans Peptides (PNNL/NCRR)

  12. High Correlation at Higher S/N

  13. Most IDs Involve Reproducible Spectra

  14. False Negatives • Identified peptides must be in the library • Or derived from library • Fortunately, number of digest peptides limited by genome and concentration • 100,000 – 1,000,000 OK • And, well-identified peptides are observed multiple times • Replicates allow spurious peak removal

  15. Reliable IDs are Made Multiple Times

  16. Library Construction • Extract spectra • Extract spectra that identify peptides from results of LC-MS/MS experiments • Create ‘consensus’ spectra • Process all spectra that identified a peptide ion • Assign probability of being correct • For each spectrum, based on input and computed scores • Create searchable library • Check consistency and annotate

  17. Spectrum 1 Spectrum 2 ConsensusSpectrum Create Consensus Spectrum from Replicate Spectra

  18. Select Best Spectra • Cluster using dot product • Find closest spectrum pair, accept nearby spectra • Limit spectra from single source • Extract peaks that appear in more than one half of spectra with sufficient S/N

  19. Confirm Peptide Identity • Develop measures of spectrum/structure consistency • separate true and false positive identifications • Derive confidence probabilities • Use later for protein identification

  20. Discrimination Factors • P1 = N1(true)/N2(false) • P2 = P1 * N2(true)/N2(false) • P = P Pn(if independent) • N1 = sequence search score • N2 = match theoretical spectrum • N3 = unassigned abundance • N4 = Y/B correlations • N5 = Y/B sequence length

  21. Sequence Match Scores • Match Target Peptide Sequence against measured peaks (m/z) • Mascot, Omssa, Sequest, X!Tandem, ….

  22. Tryptic Only Probabilities True False Sequence Score

  23. Tryptic/Semitryptic Probabilities

  24. Theoretical Spectrum • For each AA • Ratio of cleavage left vs. right for each AA • Abundance of left + right fragments • 2 sets: H+ > # Bases and H+ <= # Bases • Derive 80 parameters from reliably identified peptide ions • Disfavor fragmentation involving nearby charges • Account for cleavage position trends

  25. Observed/Theory Match at Fixed Sequence Score

  26. Distribution of Unassigned Abundances

  27. ROC Plot (Before/After)

  28. Create Library • Resolve inconsistencies • Similar spectra assigned to different ions, … • Refine probability • IDs per protein, … • Create library from consensus spectra • for P > 0.95 • Weight semitryptic, missed cleavage peptides • Include source information • For incomplete consensus spectra include selected spectra (best of replicates) – 10% • Include high score, singular spectra – 10%

  29. Spectrum Similarity Sall peaks w search w lib • w = abundance * weight • S w2 = 1 • Reduce significance for: • isotopic peaks • common loss from precursor • low m/z

  30. Spectrum Match Search Spectrum Hit Distribution Hit List Library (Consensus) Spectrum AAAINIIPTSTGAAK/2+

  31. Locate Similar Spectra AAAINIIPTSTGAAK/2+ AAAINIIPTSTGAAK/1+ AAALNI…

  32. Spectrum/Sequence Scores Vary More than Spectrum/Spectrum Scores Sequence score

  33. Spectrum/Spectrum Score Separates True and False Positives

  34. Sample ApplicationMycobacterium Smegmatis (OPD) Created a library 2739 consensus spectra from 28 series of 2D LC-MS/MS analyses: For One Series: 948 Different peptide ions identified by popular search method Library Search Results: 924 Of these peptides were re-matched 332 Peptides not identified in that series were identified 24 Peptides not re-matched were poor matches Doubled the number of spectra that match peptides

  35. State of the Library

  36. Applications • Identify of previously identified peptides • Pre- or post-processing sequence search • msec, reliable, annotation • Find recurring, unidentified compounds • Build libraries of unidentified spectra • Derive modified peptide spectra • Target peptides/proteins • Internal standards • General resource for peptide fragmentation

  37. Thanks! • For the Data • Open Proteomics Database – Feasibility • Peptide Atlas/ISB • Global Proteome Machine • DOE/PNNL/NCRR • HUPO/PPP • Markey/Wilmarth/Gygi/Hogue/Kolker/… • NIH

  38. Spectrum variability due to S/N Most replicates are similar

  39. PeptideAtlas.org (ISB, Seattle)

  40. Consecutive Y or B Ions Mascot Score = 25

  41. PNNL: Radiodurans

  42. >50% of Ids have low confidence Inadequate Separation Between True and False Identifications

  43. Are Spectra Reproducible? A Widespread Concern – even for a single instrument Variations in energy, time, collision gas, matrix, …

  44. Variability Depends on S/N

  45. Y/B Ion Abundance Correlation

More Related