490 likes | 503 Views
Building and Using Reference Libraries of Peptide Mass Spectra. Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein. Mass Spectrometry Data Center. Old Chemical Sources. Source: www.phoenixmarcom.co.uk. Traditional EI Library Search. New Chemical Sources.
E N D
Building and Using Reference Libraries of Peptide Mass Spectra Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein Mass Spectrometry Data Center
Old Chemical Sources Source: www.phoenixmarcom.co.uk
New Chemical Sources Source: www.uic.edu/.../bios100/ summer2002/lect05.htm
Reference Library Concerns • Reproducibility • MS/MS - Bad Reputation • Coverage • False Negatives
Measuring Reproducibility • Widely Used Measure of Spectral Similarity • Dot Product of Replicate Spectra • Expressed normalized vectors • a.k.a, INCOS Match Factor, Cosine of Angle, Contrast Angle, ‘Correlation’, … • Varies with Signal To Noise • Maximum/Median Abundance • Robust measure for raw spectra
High S/N Spectra are Reproducible Quad IT Linear IT AEFVEVTK/+1
Low S/N Spectra are not so Reproducible S/N ~ 10 S/N ~ 5
Low S/N = Variable Abundance + Noise S/N ~ 10 S/N > 100
Variability Depends on S/N ~9,000 Radiodurans Peptides (PNNL/NCRR)
False Negatives • Identified peptides must be in the library • Or derived from library • Fortunately, number of digest peptides limited by genome and concentration • 100,000 – 1,000,000 OK • And, well-identified peptides are observed multiple times • Replicates allow spurious peak removal
Library Construction • Extract spectra • Extract spectra that identify peptides from results of LC-MS/MS experiments • Create ‘consensus’ spectra • Process all spectra that identified a peptide ion • Assign probability of being correct • For each spectrum, based on input and computed scores • Create searchable library • Check consistency and annotate
Spectrum 1 Spectrum 2 ConsensusSpectrum Create Consensus Spectrum from Replicate Spectra
Select Best Spectra • Cluster using dot product • Find closest spectrum pair, accept nearby spectra • Limit spectra from single source • Extract peaks that appear in more than one half of spectra with sufficient S/N
Confirm Peptide Identity • Develop measures of spectrum/structure consistency • separate true and false positive identifications • Derive confidence probabilities • Use later for protein identification
Discrimination Factors • P1 = N1(true)/N2(false) • P2 = P1 * N2(true)/N2(false) • P = P Pn(if independent) • N1 = sequence search score • N2 = match theoretical spectrum • N3 = unassigned abundance • N4 = Y/B correlations • N5 = Y/B sequence length
Sequence Match Scores • Match Target Peptide Sequence against measured peaks (m/z) • Mascot, Omssa, Sequest, X!Tandem, ….
Tryptic Only Probabilities True False Sequence Score
Theoretical Spectrum • For each AA • Ratio of cleavage left vs. right for each AA • Abundance of left + right fragments • 2 sets: H+ > # Bases and H+ <= # Bases • Derive 80 parameters from reliably identified peptide ions • Disfavor fragmentation involving nearby charges • Account for cleavage position trends
Create Library • Resolve inconsistencies • Similar spectra assigned to different ions, … • Refine probability • IDs per protein, … • Create library from consensus spectra • for P > 0.95 • Weight semitryptic, missed cleavage peptides • Include source information • For incomplete consensus spectra include selected spectra (best of replicates) – 10% • Include high score, singular spectra – 10%
Spectrum Similarity Sall peaks w search w lib • w = abundance * weight • S w2 = 1 • Reduce significance for: • isotopic peaks • common loss from precursor • low m/z
Spectrum Match Search Spectrum Hit Distribution Hit List Library (Consensus) Spectrum AAAINIIPTSTGAAK/2+
Locate Similar Spectra AAAINIIPTSTGAAK/2+ AAAINIIPTSTGAAK/1+ AAALNI…
Spectrum/Sequence Scores Vary More than Spectrum/Spectrum Scores Sequence score
Sample ApplicationMycobacterium Smegmatis (OPD) Created a library 2739 consensus spectra from 28 series of 2D LC-MS/MS analyses: For One Series: 948 Different peptide ions identified by popular search method Library Search Results: 924 Of these peptides were re-matched 332 Peptides not identified in that series were identified 24 Peptides not re-matched were poor matches Doubled the number of spectra that match peptides
Applications • Identify of previously identified peptides • Pre- or post-processing sequence search • msec, reliable, annotation • Find recurring, unidentified compounds • Build libraries of unidentified spectra • Derive modified peptide spectra • Target peptides/proteins • Internal standards • General resource for peptide fragmentation
Thanks! • For the Data • Open Proteomics Database – Feasibility • Peptide Atlas/ISB • Global Proteome Machine • DOE/PNNL/NCRR • HUPO/PPP • Markey/Wilmarth/Gygi/Hogue/Kolker/… • NIH
Spectrum variability due to S/N Most replicates are similar
Consecutive Y or B Ions Mascot Score = 25
>50% of Ids have low confidence Inadequate Separation Between True and False Identifications
Are Spectra Reproducible? A Widespread Concern – even for a single instrument Variations in energy, time, collision gas, matrix, …