Building and Using Reference Libraries of Peptide Mass Spectra

Building and Using Reference Libraries of Peptide Mass Spectra Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein Mass Spectrometry Data Center

Old Chemical Sources Source: www.phoenixmarcom.co.uk

Traditional EI Library Search

New Chemical Sources Source: www.uic.edu/.../bios100/ summer2002/lect05.htm

Peptide MS/MS Search

Reference Library Concerns • Reproducibility • MS/MS - Bad Reputation • Coverage • False Negatives

Measuring Reproducibility • Widely Used Measure of Spectral Similarity • Dot Product of Replicate Spectra • Expressed normalized vectors • a.k.a, INCOS Match Factor, Cosine of Angle, Contrast Angle, ‘Correlation’, … • Varies with Signal To Noise • Maximum/Median Abundance • Robust measure for raw spectra

High S/N Spectra are Reproducible Quad IT Linear IT AEFVEVTK/+1

Low S/N Spectra are not so Reproducible S/N ~ 10 S/N ~ 5

Low S/N = Variable Abundance + Noise S/N ~ 10 S/N > 100

Variability Depends on S/N ~9,000 Radiodurans Peptides (PNNL/NCRR)

High Correlation at Higher S/N

Most IDs Involve Reproducible Spectra

False Negatives • Identified peptides must be in the library • Or derived from library • Fortunately, number of digest peptides limited by genome and concentration • 100,000 – 1,000,000 OK • And, well-identified peptides are observed multiple times • Replicates allow spurious peak removal

Reliable IDs are Made Multiple Times

Library Construction • Extract spectra • Extract spectra that identify peptides from results of LC-MS/MS experiments • Create ‘consensus’ spectra • Process all spectra that identified a peptide ion • Assign probability of being correct • For each spectrum, based on input and computed scores • Create searchable library • Check consistency and annotate

Spectrum 1 Spectrum 2 ConsensusSpectrum Create Consensus Spectrum from Replicate Spectra

Select Best Spectra • Cluster using dot product • Find closest spectrum pair, accept nearby spectra • Limit spectra from single source • Extract peaks that appear in more than one half of spectra with sufficient S/N

Confirm Peptide Identity • Develop measures of spectrum/structure consistency • separate true and false positive identifications • Derive confidence probabilities • Use later for protein identification

Discrimination Factors • P1 = N1(true)/N2(false) • P2 = P1 * N2(true)/N2(false) • P = P Pn(if independent) • N1 = sequence search score • N2 = match theoretical spectrum • N3 = unassigned abundance • N4 = Y/B correlations • N5 = Y/B sequence length

Sequence Match Scores • Match Target Peptide Sequence against measured peaks (m/z) • Mascot, Omssa, Sequest, X!Tandem, ….

Tryptic Only Probabilities True False Sequence Score

Tryptic/Semitryptic Probabilities

Theoretical Spectrum • For each AA • Ratio of cleavage left vs. right for each AA • Abundance of left + right fragments • 2 sets: H+ > # Bases and H+ <= # Bases • Derive 80 parameters from reliably identified peptide ions • Disfavor fragmentation involving nearby charges • Account for cleavage position trends

Observed/Theory Match at Fixed Sequence Score

Distribution of Unassigned Abundances

ROC Plot (Before/After)

Create Library • Resolve inconsistencies • Similar spectra assigned to different ions, … • Refine probability • IDs per protein, … • Create library from consensus spectra • for P > 0.95 • Weight semitryptic, missed cleavage peptides • Include source information • For incomplete consensus spectra include selected spectra (best of replicates) – 10% • Include high score, singular spectra – 10%

Spectrum Similarity Sall peaks w search w lib • w = abundance * weight • S w2 = 1 • Reduce significance for: • isotopic peaks • common loss from precursor • low m/z

Spectrum Match Search Spectrum Hit Distribution Hit List Library (Consensus) Spectrum AAAINIIPTSTGAAK/2+

Locate Similar Spectra AAAINIIPTSTGAAK/2+ AAAINIIPTSTGAAK/1+ AAALNI…

Spectrum/Sequence Scores Vary More than Spectrum/Spectrum Scores Sequence score

Spectrum/Spectrum Score Separates True and False Positives

Sample ApplicationMycobacterium Smegmatis (OPD) Created a library 2739 consensus spectra from 28 series of 2D LC-MS/MS analyses: For One Series: 948 Different peptide ions identified by popular search method Library Search Results: 924 Of these peptides were re-matched 332 Peptides not identified in that series were identified 24 Peptides not re-matched were poor matches Doubled the number of spectra that match peptides

State of the Library

Applications • Identify of previously identified peptides • Pre- or post-processing sequence search • msec, reliable, annotation • Find recurring, unidentified compounds • Build libraries of unidentified spectra • Derive modified peptide spectra • Target peptides/proteins • Internal standards • General resource for peptide fragmentation

Thanks! • For the Data • Open Proteomics Database – Feasibility • Peptide Atlas/ISB • Global Proteome Machine • DOE/PNNL/NCRR • HUPO/PPP • Markey/Wilmarth/Gygi/Hogue/Kolker/… • NIH

Spectrum variability due to S/N Most replicates are similar

PeptideAtlas.org (ISB, Seattle)

Consecutive Y or B Ions Mascot Score = 25

PNNL: Radiodurans

>50% of Ids have low confidence Inadequate Separation Between True and False Identifications

Are Spectra Reproducible? A Widespread Concern – even for a single instrument Variations in energy, time, collision gas, matrix, …

Variability Depends on S/N

Y/B Ion Abundance Correlation

Building and Using Reference Libraries of Peptide Mass Spectra

Building and Using Reference Libraries of Peptide Mass Spectra

Presentation Transcript

Peptide Mass Fingerprinting

Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression

Interpretation of Mass Spectra Part 4

Galaxy and Mass Power Spectra

Academic Libraries and Virtual Reference

Algorithms for Peptide Mass Spectrometry

Efficient and accurate algorithms for peptide mass spectrometry

Building and Using Libraries

bioinformatics.icmb.utexas/OPD ~400,000 peptide mass spectra

Peptide Sequencing by Mass Spectrometry

Dependence of Optical Spectra on Temperature and Isotopic Mass

A Reference Library of Peptide Ion Fragmentation Spectra

Portals, Ready Reference, and Libraries

Evaluated Reference MS/MS Spectra Libraries

Reference and Libraries Australia Search

DEB-based body mass spectra

Lecture 10 Interpretation of Mass Spectra Peptide Mass Fingerprinting MS/MS sequencing

peptide mass fingerprinting pmf

Peptide Sequencing by Mass Spectrometry

Lecture 10 Interpretation of Mass Spectra Peptide Mass Fingerprinting MS/MS sequencing

Ionization techniques, Mass spectra and MS-Instrumentation

Reference and Libraries Australia Search