350 likes | 535 Views
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests. Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein. Mass Spectrometry Data Center. Library searching in not new. Organize for Reuse. MS Library Searching.
E N D
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass Spectrometry Data Center
Library searching in not new Organize for Reuse
MS Library Searching • Hertz, Hites and Biemann Anal. Chem. (1971). • PBM: McLafferty, Hertel, Villwock Org. Mass Spectrom. (1974). • SISCOM: Damen, Henneberg, Weimann, Anal. Chem. Acta (1978). • INCOS: Sokolow, Karnofsky, Gustafson , Finnigan Application Report 2 (March 1978). • Stein, Scott J. Amer. Soc. Mass Spectrom., (1994).
Sum over all peaks in common ‘Dot Product’(cosine of ‘angle’ between a pair of spectra) • Measured = f(m/z abundance) • Reference = f(m/z abundance) • f(abundance) : Weight as you like Normalize
Variability Depends on S/N ~7,000 Radiodurans Peptides, LCQ (PNNL/NCRR) Medians
Library Searching for Peptides • LIBQUEST (Yates) • Yates et al, Anal. Chem., 1998, 70, 3557 • X!Hunter (Beavis) • Craig et al, J. Proteome Res., 2006, 5, 1843 • BiblioSpec (MacCoss) • Frewen et al., Anal. Chem. 2006, 78, 5678 • Spectral Comparison (Kearney) • Liu et al, Proteome Science 2007, 5:3 • SpectraST (Aebersold) • Lam et al., Proteomics 2007 6, 655-667 • NIST Peptide Ion Fragmentation Library • June 2006 release (US-HUPO – March 2004)
Why Spectrum Libraries? • More sensitive • Better scoring • Faster • Annotation • Unrestricted precursor ion
Identification by Spectrum Matching is More Sensitive than by Spectrum/Sequence Matching Simple Protein Mix
Spectrum/Spectrum Scores are More Robust than Sequence/Spectrum Scores 99% Confidence Sequence score
Matching Spectra is Faster than Matching Sequence 0.005/s vs. 6.2/s per query spectrum
Reference Library Building • Extract identified spectra from sequence search • Multiple search engines • Instrument-class specific • Create ‘consensus’ spectra • Two or more matching spectra, also save best • Assign probability of being correct • Refine confidence starting from decoy FDR • Classify peptides – tryptic, missed cleavage, semi, mods • Create searchable spectral library • Resolve conflicts, add annotation
Three Classes of Libraries I. Conventional Target Identification • Peptides (Proteins) II. Identifiable • By unconventional searching III. Not Identifiable • Account for all recurring spectra • QA/QC
1350 747 353 1752 318 833 78K6/07 34K6/06 I.OMSSAoverlap with MS/MS Library Search Identified spectra (1% FDR) for 1-D Yeast NCI/CPTAC – Vanderbilt
II. Identify What we CanDerive Class-specific FDR • Tryptic • Simple • Expected missed cleavages • Unexpected missed cleavages • Semitryptic (cleaved tryptic) • No missed cleavage • In source (with parent at same retention) • In sample • Missed cleavage • In source (with parent) • In sample (obey rules) • Uncommon – reject • Others …
Atypical Peptide Ionsuse Sequence Search Method • Tryptic only with many mods • Less common: Methylation, Phosphorylation, … • Artifacts: Na, K, Carbamyl • InsPecT/Pevzner (Unidentified, +70) • High charge states, >2 missed cleavages • Use class specific score thresholds
HSA/Fibrinogen/Transferrin Mix 6124 Consensus Peptide Spectra, IT, Qtof, TofTof Ion Trap Peptide Ions: 1300 HSA, 1100 Fibrinogen, 700 Transferrin
III. Library ofRecurring, Unidentified Spectra • Create consensus spectra • From similar spectra from an experiment • Combine from multiple experiments • Identify spectra in other experiments • QA/QC: Artifacts, in standards, … • Apply other sequencing methods
Assign all Spectra • Identified Spectrum • Matches library peptide or unidentified spectrum • Subset of peaks match library spectrum (impure) • Similar to a matched spectrum (cluster) • Not a Peptide • Low S/N • Maximum/Median <15 • High charge state (many large peaks) • Proteins, large fragments, … • One dominant peak • Stable ion, not peptide • Singly charged (high/low abund < 1.2) • Probable artifact, lower probability of identification • Narrow m/z range • Peptide?
assigned assigned Sequence Search, De Novo, Theoretical Spec, Similarity, ... Pep. Lib Unass. Lib No ID No ID No ID No ID Garbage filter Mass spectrometer unassigned Library Pipeline of the Future
NCI/NIH - CPTAC:Clinical Proteomic Technology Assessment for Cancer http://proteomics.cancer.gov Technology assessment; develop standard protocols and clinical reference sets; and evaluate methods to ensure data reproducibility. Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University, University of California, San Francisco,, and Vanderbilt University School of Medicine. NCI grants (U24CA126476-01, U24CA126485-01, U24CA126480-01, U24CA126477-01, and U24CA126479-01).
YICENQDSISSK Lab-to-Lab Chromatography INCAPSLTQ BroadOrbitrap PurdueLTQ VandyOrbitrap VandyLTQ NYUOrbitrap NISTLTQ
Measures of Reproducibility • Identified ions • Unique peptides, Ions, Spectrum counts • Unidentified components • Classify by type, link to origin • Ion cluster analysis • MS1 linked to MS2 • Chromatography • Time evolution of ion clusters
Components in Replicate Runs total ▲▼ run 1,2 ■ in both sampled identified