1 / 34

Analysis of Complex Proteomic Datasets Using Scaffold

Analysis of Complex Proteomic Datasets Using Scaffold. Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com. Scaffold: Why do we need it?. Shotgun proteomics  Analysis of complex mixtures. Whole cell extract. 10,000+ proteins. 600,000 peptides. 1.2 Million Spectra!!!.

Download Presentation

Analysis of Complex Proteomic Datasets Using Scaffold

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com

  2. Scaffold: Why do we need it? Shotgun proteomics  Analysis of complex mixtures Whole cell extract 10,000+ proteins 600,000 peptides 1.2 Million Spectra!!! • Beyond the realm of manual interpretation • How do we determine what is a valid protein identification?

  3. Statistical Analysis Using Scaffold • All search engines use different scoring • algorithms  Can not directly compare results • Many search engines results are described by • more than one value Examples: Mascot  Ion Score and Identity Score Sequest  Xcorr and DeltaCn

  4. Statistical Analysis Using Scaffold Peptide Prophet* • Creates a universal score (discriminant score) for the search • engine result (e.g. XCorr and DeltaCn are compressed to one • score for SEQUEST results, Ion score and Identity score for • Mascot results) • Plots a histogram of the discriminant scores and • calculates a bimodal distribution based on standard • statistics to differentiate between correct and incorrect hits • Computes the probability that the match is correct at a • given discriminant score *Nesvizhskii, A. I. et al, Anal. Chem.2003, 75, 4646-4658

  5. Statistical Analysis Using Scaffold 200 180 Histogram of discriminate scores 160 140 120 100 Number of spectra in each bin 80 60 40 20 0 -3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3 Discriminant score (D)

  6. Statistical Analysis Using Scaffold 200 180 160 140 120 100 Number of spectra in each bin 80 60 40 20 0 -3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3 Discriminant score (D) Assumes a mixture of standard statistical distributions “incorrect” “correct”

  7. Statistical Analysis Using Scaffold 200 180 160 140 120 100 Number of spectra in each bin 80 60 40 20 0 -3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3 Discriminant score (D) Peptide Probability Threshold “incorrect” “correct”

  8. Statistical Analysis Using Scaffold 9% 22% 4% 34% 19% 7% 5% One Search Engine may not be enough SEQUEST X!Tandem Mascot www.proteomesoftware.com

  9. Statistical Analysis Using Scaffold • Peptide Prophet statistics are applied separately for • each search engine result (i.e. Mascot, SEQUEST, • and X!Tandem) • Scaffold Merger combines the peptide probabilities • from each search engine to generate a protein • probability The probability of identifying a spectrum + The probability of agreement between search engines Protein Probability

  10. Statistical Analysis Using Scaffold Advantages using of Scaffold • Allows you to choose a statistical error rate by setting probability thresholds • Allows you to compare and combine results from different experiments and different search engines • Allows sharing of raw data and search results • Accepted as a suitable statistical method to validate large datasets

  11. This is the Samples view

  12. List of all the proteins found in your samples Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries

  13. How does Scaffold Deal with peptides that can be assigned to more than one protein? General Rule  Explain the spectral data with the smallest set of proteins B Protein A and Protein B share all the same peptides so they will be grouped together A

  14. How does Scaffold Deal with peptides that can be assigned to more than one protein? General Rule  Explain the spectral data with the smallest set of proteins B Protein A and protein B each have one unique peptide  they will be listed separately only if the peptide probability is > 50% A

  15. How does Scaffold Deal with peptides that can be assigned to more than one protein? General Rule  Explain the spectral data with the smallest set of proteins B Protein B has two unique peptides  it will be listed separately A

  16. Scaffold will extract GO terms from NCBI annotations

  17. Gene Ontology “GO” terms • Controlled vocabulary containing consistent • descriptions of gene products in different • databases • Describe gene products in terms of their • associated biological processes, cellular • components and molecular functions in a species • independent manner Gene Ontology Projecthttp://www.geneontology.org/GO.doc.shtml

  18. List of samples

  19. Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined Color coded to represent probability that protein identification is correct

  20. This is the Proteins view

  21. Spectrum of each peptide labeled with y and b ions which can be used for manual validation

  22. Manual Spectrum Evaluation • Search engine scores  Is peptide found by more • than one search engine? • Mascot ion score > 40 • SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion) • deltaCn > 0.2 • Good signal-to-noise • Long stretches of y and/or b ions • All dominant peaks are assigned as y or b ions • Fragmentation chemistry N-terminal cleavage at P  dominate y-ion C-terminal cleavage at D and E  dominate b-ion Peptides containing W  abundant y-ions S and T  tend to lose water (-18 Da) R, N, and Q  tend to lose ammonia (-17 Da)

  23. Good Spectrum Good coverage of y and b ion series Dominant y-ion at N-terminal cleavage of P Peptide Sequence IAELAGFSVPENTK +2 charge on parent peptide Good signal-to-noise Mascot: Ion Score = 60.1 Identify Score = 37.3 SEQUEST: Xcorr = 2.61 deltaCn = 0.4

  24. Bad Spectrum Poor signal-to-noise Multiple unassigned peaks Peptide Sequence YPLADYALTPDMAIVDANLVMDMPK +3 charge on parent peptide Poor coverage of y and b ion series Mascot: Ion Score = 9.93 Identity Score = 37.3 SEQUEST: Xcorr = 2.26 deltaCn = 0.2

  25. This is the Statistics view

  26. Scaffold Statistics View Score Histogram Blue indicates “incorrect” proteins Red indicates “correct” proteins Important! Must have enough data to fit two distributions for the statistics to be valid. Protein is “correct” if it passes the peptide and protein probability and minimum # peptide filters.

  27. Scaffold Statistics View With at least 2 unique Peptides (95% peptide prob) the maximum protein probability is ~100%. With only 1 unique peptide (95% peptide prob) the maximum protein probability is <90%.

  28. Scaffold Statistics View Missed IDs SEQUEST only

  29. Scaffold Statistics View Mascot only Missed IDs

  30. Both Mascot only Sequest only Scaffold Statistics View Using both Mascot and Sequest results in more “correct” protein identifications

  31. This is the Publish View

  32. Publication Guidelines for Proteomic Data Journal of Molecular and Cellular Proteomics http://www.mcponline.org/misc/ParisReport_Final.shtml

  33. Publication Guidelines for Proteomic Data Data Analysis • Name and version of software used to extract peak list • Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) • Values of all search parameters used (enzyme, modifications, mass tolerance, etc.) • Name and size of the database searched (Swisprot or NCBI and the number of sequence entries) • Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings)

  34. Publication Guidelines for Proteomic Data Each Peptide Identified • Peptide sequence noting any modifications or missed cleavages • Parent peptide ion mass and charge • All search engine scores Each Protein Identified • Accession number • Sequence coverage and total number of unique peptides

More Related