1 / 1

Building The Library

Extract Common Features. Consensus Spectrum. Spectrum 1. Spectrum 2. A ‘consensus spectrum’ is composed of peaks present in the majority of spectra in which the peak could have been generated. Consensus Spectrum.

hei
Download Presentation

Building The Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extract Common Features Consensus Spectrum Spectrum 1 Spectrum 2 A ‘consensusspectrum’ is composed of peaks present in the majority of spectra in which the peak could have been generated ConsensusSpectrum These plots illustrate quantitative relations for factors a) and b) that can aid in the separation of true and false positive identifications Correct identifications have lower fractions of unassigned abundance than incorrect identifications True positive spectra match theoretical spectra better than false positive spectra. • Building The Library • Extract spectra and analysis information for reliable peptide identifications from spectrum/sequence matches. • Create a ‘consensus spectrum’ from all spectra assigned to a single peptide ion. • 3) For each consensus spectrum, perform spectrum/sequence consistency check. Also examine reliable, single hit identifications. • 4) Assign reliability measures to each spectrum and build library. Library Construction Flow Diagram LC-MS/MS Results All Data for an Identified Peptide Ion Building and Using Libraries of Peptide Ion Fragmentation SpectraS.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. RothNational Institute of Standards and Technology, Gaithersburg, MD/Charleston, SC Consensus Spectra Derived ConfirmatoryInformation Spectra Sequence Method: Spectra that identify peptides by current sequence/spectrum matching methods are first added to an archive along with associated information. For each identified peptide ion that has been identified more than once, an annotated ‘consensus spectrum’ is derived from the spectra used to make those identifications. Consensus spectra are then subjected to spectrum/sequence consistency tests to assign a measure of identification reliability and remove false positive identifications. This employs sequence/spectrum correlations reported in the literature and found in our work as well as other quantities found to be correlated with identification reliability. Examples are the similarity of original spectra, unassigned abundance in the consensus spectrum and scores of the original sequence/spectrum match. Related consensus spectra are intercompared to find further errors and finally combined to form an annotated, searchable reference library. FragmentationRules Spectrum/Sequence Consistency Analysis • Source Spectra • Online repositories- PeptideAtlas- the Global Proteome Machine - Open Proteomics Database- NCRR Proteomics Resource • Collaborators/Contributors- NIH/LNT (Markey/Geer/Kowalik…)- Institute of Systems Biology (Nesvizshskii/King/Aebersold/…) - theGPM (Beavis)- Blueprint Initiative (Hogue) • NIST MeasurementsSingle Protein Digests/IT, 3Q ReferenceArchive ReferenceLibrary Annotated ReferenceSpectrum Spectrum Matching Algorithms Algorithms: Measures of spectrum matching have been adapted from algorithms used for electron ionization spectra. Peaks are weighted by their significance: - Reduce significance of common impurity ions (e.g., neutral loss from parent ion) - Adjust Y/B weighting for instrument and sequence - Reduce weight for uncertain and isotopic peaks - Use confidence of library spectrum Speed: Straightforward indexing leads to very fast identification (<< sec) even for very large libraries. Spectrum Variability: Effective peptide identification by matching spectra in a reference library requires that spectra are reproducible. The degree of variability has been measured using sets of spectra identifying the same precursor ion. Variations typical of ion trap spectra are shown below. 2) Build Consensus Spectra Reject noise and spurious signals/Measure and report variability Replicate spectra at moderate to low S/N commonly show spurious peaks from neutral losses by impurity ions of uncertain origin as well as seemingly random noise spikes. Identify matching m/z peaks in input spectra. Find and exclude outlier spectra (use dot product of each pair) when multiple sources are available , limit the number of spectra from a single source Omit peaks occurring in ½ or fewer of spectra consider only spectra with sufficiently high S/N to have generated the peak of interest Compute average abundance, report variance Create annotated spectruminclude information such as spectrum origin, retention, median dot product using all peaks and only consensus peaks, fraction of abundance not at consensus m/z, … Sample Application: Mycobacterium Smegmatis [data from the Open Proteomics Database/R.Wang, 27 sets of experiments] - Created library of 2739 Peptide Ion Consensus Spectra - 95% of identified spectra matched peptides identified by other spectra In one series of LC-MS/MS experiments: 1551 Spectra were identified by sequence search engine 1527 Of the above spectra were re-matched by consensus library searching 1067 Spectra not identified by sequence search were identified by library search 948 Different peptides were originally identified by sequence search engine 924 Peptides were re-matched 332 Peptides not identified by sequence search in this series were identified 24 Peptides not re-matched were rich in false positives 983 Consensus spectra of spectra unmatched by library search were derived Pairs of spectra identified as originating from a given peptide ion have been compared to each other – variations in abundance have been found to depend primarily on the level of signal/noise in the measured spectra. Overview: MS/MS spectra that serve to identify peptides by sequence/spectrum matching can be of value for more reliably identifying those peptides in later studies. Our work involves the development of methods for refining and reusing information contained in these spectra to create mass spectral reference libraries, not dissimilar from those routinely employed in other fields of mass spectrometry. Above S/N of 40, spectra are quite reproducible (0.7 or better median dot product). Most spectra identified by sequence search methods have dot products above 0.7. Ways of Using the Library Identification Confirmation (post-processing)Confirm/reject peptides identified by sequence search programs Find peptides (pre-processing) Search all spectra against libraryThen search unmatched peptides using sequence search and other methods Create library of ‘unidentified spectra’ from their consensus spectraCompare to identified peptides, use de novo methods, find biomarkers, … Target identificationInternal standards, biomarkers, target proteins Transmit peptide analysis informationDifficult-to-identify, unusual, manually identified, special meaning 3) Spectrum/Sequence ConsistencyFind likelihood that a consensus spectrum originated from assigned sequence.Each factor serves to refine probability. Factors identified as discriminating (a and b are illustrated below): Match with theoretical spectrum based on individual amino acid fragmentation behavior Fraction of unassigned abundance abundance for peaks not originating from a known fragmentation path Y/B ion correlations and ratio of Y/B ion abundance sums Unexpected major peaks formally consistent with rules include tests for reasonableness of neutral losses Amino acid specific rules (Proline, Glycine, Aspartic Acid, …) Predicted/observed fragment ion charge state abundance ratios Similarity of pairs of spectra originating from the same peptide ion are measured by their ‘dot product’, where spectra are expressed as normalized vectors. Here, S/N is measured as the maximum to median abundance in a typical spectrum (assumes most peaks are noise). Occurrence Distribution: Most peptide identifications are made multiple times. Next Steps Enhance QA/QC Goal: Distinguish unexpected fragmentations from misidentifications Add spectra from reliable singly identified peptides to Library Build comprehensive Libraries Test and optimize spectrum matching algorithms Provide annotated libraries for mass spectrometry data systems Roughly 10% of identifications were made more than 100 times In two well-studied cases, about 5% of identifications were made by only one spectrum. Approximately 1/3 of peptides identified were identified only once (not shown). These will be separately processed, requiring higher scores for acceptance in library. • Please Recycle Your Spectra! • - Your MS/MS Data Files contain valuable, reusable information that may be very helpful to others (and maybe even you) after recycling. • - Please let us know if we may recycle them (steve.stein@nist.gov / 301-975-2505). Raw data files are fine, with or without identifications. • Sources of original spectra are cited as part of consensus spectrum annotation (if you wish). • We also encourage submission to on-line spectral repositories (www.peptideatlas.org, www.thegpm.org, bioinformatics.icmb.utexas.edu/OPD/) These plots use D. Radiodurans spectra from NCRR, false ids obtained by searching Human sequences. True and false identifications had the same average sequence ‘score’. 4) Overall Identification Confidence [under development] Influential factors employed Spectrum/Sequence Consistency (described above) Original Score (degree of sequence match) Peptide sequence re-identificationsNumber of different spectra, experiments, modifications, charge states Number of peptides per protein Distribution of Number of Identifications Per Identified Peptide for D. Radiodurans (spectra from NCRR, http://ncrr.pnl.gov/data/) and M. Smegmatis (Open Proteomics Database)

More Related