490 likes | 508 Views
Learn about building and utilizing reference libraries of peptide mass spectra for accurate protein identification. Strategies for enhancing library reliability, reproducibility, and sensitivity are discussed, along with tips for constructing and validating a robust spectral database.
E N D
Building and Using Reference Libraries of Peptide Mass Spectra Lisa Kilpatrick, Michael Mautner, Pedi Neta, Jeri Roth, Steve Stein Mass Spectrometry Data Center
Old Chemical Sources Source: www.phoenixmarcom.co.uk
New Chemical Sources Source: www.uic.edu/.../bios100/ summer2002/lect05.htm
Reference Library Concerns • Reproducibility • MS/MS - Bad Reputation • Coverage • False Negatives
Measuring Reproducibility • Widely Used Measure of Spectral Similarity • Dot Product of Replicate Spectra • Expressed normalized vectors • a.k.a, INCOS Match Factor, Cosine of Angle, Contrast Angle, ‘Correlation’, … • Varies with Signal To Noise • Maximum/Median Abundance • Robust measure for raw spectra
High S/N Spectra are Reproducible Quad IT Linear IT AEFVEVTK/+1
Low S/N Spectra are not so Reproducible S/N ~ 10 S/N ~ 5
Low S/N = Variable Abundance + Noise S/N ~ 10 S/N > 100
Variability Depends on S/N ~9,000 Radiodurans Peptides (PNNL/NCRR)
False Negatives • Identified peptides must be in the library • Or derived from library • Fortunately, number of digest peptides limited by genome and concentration • 100,000 – 1,000,000 OK • And, well-identified peptides are observed multiple times • Replicates allow spurious peak removal
Library Construction • Extract spectra • Extract spectra that identify peptides from results of LC-MS/MS experiments • Create ‘consensus’ spectra • Process all spectra that identified a peptide ion • Assign probability of being correct • For each spectrum, based on input and computed scores • Create searchable library • Check consistency and annotate
Spectrum 1 Spectrum 2 ConsensusSpectrum Create Consensus Spectrum from Replicate Spectra
Select Best Spectra • Cluster using dot product • Find closest spectrum pair, accept nearby spectra • Limit spectra from single source • Extract peaks that appear in more than one half of spectra with sufficient S/N
Confirm Peptide Identity • Develop measures of spectrum/structure consistency • separate true and false positive identifications • Derive confidence probabilities • Use later for protein identification
Discrimination Factors • P1 = N1(true)/N2(false) • P2 = P1 * N2(true)/N2(false) • P = P Pn(if independent) • N1 = sequence search score • N2 = match theoretical spectrum • N3 = unassigned abundance • N4 = Y/B correlations • N5 = Y/B sequence length
Sequence Match Scores • Match Target Peptide Sequence against measured peaks (m/z) • Mascot, Omssa, Sequest, X!Tandem, ….
Tryptic Only Probabilities True False Sequence Score
Theoretical Spectrum • For each AA • Ratio of cleavage left vs. right for each AA • Abundance of left + right fragments • 2 sets: H+ > # Bases and H+ <= # Bases • Derive 80 parameters from reliably identified peptide ions • Disfavor fragmentation involving nearby charges • Account for cleavage position trends
Create Library • Resolve inconsistencies • Similar spectra assigned to different ions, … • Refine probability • IDs per protein, … • Create library from consensus spectra • for P > 0.95 • Weight semitryptic, missed cleavage peptides • Include source information • For incomplete consensus spectra include selected spectra (best of replicates) – 10% • Include high score, singular spectra – 10%
Spectrum Similarity Sall peaks w search w lib • w = abundance * weight • S w2 = 1 • Reduce significance for: • isotopic peaks • common loss from precursor • low m/z
Spectrum Match Search Spectrum Hit Distribution Hit List Library (Consensus) Spectrum AAAINIIPTSTGAAK/2+
Locate Similar Spectra AAAINIIPTSTGAAK/2+ AAAINIIPTSTGAAK/1+ AAALNI…
Spectrum/Sequence Scores Vary More than Spectrum/Spectrum Scores Sequence score
Sample ApplicationMycobacterium Smegmatis (OPD) Created a library 2739 consensus spectra from 28 series of 2D LC-MS/MS analyses: For One Series: 948 Different peptide ions identified by popular search method Library Search Results: 924 Of these peptides were re-matched 332 Peptides not identified in that series were identified 24 Peptides not re-matched were poor matches Doubled the number of spectra that match peptides
Applications • Identify of previously identified peptides • Pre- or post-processing sequence search • msec, reliable, annotation • Find recurring, unidentified compounds • Build libraries of unidentified spectra • Derive modified peptide spectra • Target peptides/proteins • Internal standards • General resource for peptide fragmentation
Thanks! • For the Data • Open Proteomics Database – Feasibility • Peptide Atlas/ISB • Global Proteome Machine • DOE/PNNL/NCRR • HUPO/PPP • Markey/Wilmarth/Gygi/Hogue/Kolker/… • NIH
Spectrum variability due to S/N Most replicates are similar
Consecutive Y or B Ions Mascot Score = 25
>50% of Ids have low confidence Inadequate Separation Between True and False Identifications
Are Spectra Reproducible? A Widespread Concern – even for a single instrument Variations in energy, time, collision gas, matrix, …