1 / 28

Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy

Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy. Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011. The Sensitivity and Accuracy Dilemma. false. true. score. Publication Guideline.

elon
Download Presentation

Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Guide to Significantly Improve Peptide IdentificationSensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

  2. The Sensitivity and Accuracy Dilemma false true score

  3. Publication Guideline • Earlier experiments paid too much attention on sensitivity and not enough on accuracy. • MCP started the guideline in 2004 to ensure accuracy.

  4. People are generally over-optimistic about how reliable their results are.– ABRF iPRG 2011. “ ” 1% Estimated FDR upper bound Estimated FDR lower bound 30 out of 45 submissions have FDR much higher than the required 1% iPRG/ABRF 2011 Study

  5. PEAKS Achieved both Sensitivity and Accuracy 1% PEAKS PEAKS More peptides in submission

  6. Outline • FDR – pitfalls and solutions • De novo sequencing assisted database search • Three essential examinations to ensure result quality.

  7. 1. FDR – pitfalls and solutions

  8. FDR Estimation Protein DB Identified Peptides target Search Engine # false target hits # decoy hits decoy

  9. Pitfall 1 – Multiple Round Search more targets than decoys Round 1. Fast Search Round 2. More Sensitive Search # false target hits # decoy hits • Craig and Beavis 2004. Bioinformatics20, 1466–67. FDR underestimation. • Evertt et al. 2010. J Proteome Res. 9, 700-707. • Bern and Kil 2011, J Proteome Res. 10, 2123-27.

  10. Our Solution: Decoy Fusion Equal targets and decoys Decoy sequence append to each target protein. Fast Search More Sensitive Search # false target hits # decoy hits PEAKS DB paper. Submitted.

  11. Pitfall 2 – Mix Protein and Peptide ID A weak hit is “saved” due to the bonus. So is this weak false hit. Idea: Peptides on a multi-hit protein get a bonus on their scores to increase sensitivity. Pitfall target false hit More multi-hit proteins from target DB more false hits are “saved” from target DB FDR underestimation. decoy hit

  12. Our Solution: Decoy Fusion Weak false hits are “saved” with approx. equal probabilities in target and decoy. Get the sensitivity, but still estimate the FDR correctly.

  13. Pitfall 3 – Machine Learning with Decoy Idea: Re-train the coefficients of scoring function for every search after knowing the decoy hits. Pitfall: Risk of over-fit. Machine learning experts only. Adjust scoring function to remove decoy hits after search. Search target false hits decoy hits Fewer target false hits are removed FDR underestimation

  14. Solutions • Don’t use it. • Judges cannot be players. • Only use for very large dataset. • Train coefficients and reuse; don’t re-train for every search. or or

  15. PEAKS 5.3 • PEAKS DB used all these techniques (and many more) to ensure the accuracy while maximizing sensitivity. • Reliable FDR estimation is the top priority in PEAKS DB design.

  16. 2. De novo sequencing assisted database search

  17. An Idea to Improve Score Function Idea: If de novo matches a DB peptide, it is likely to be correct. false true score

  18. De Novo Assisted DB Search # matched amino acids between de novo & DB search x+4y best separation line DB Search Score

  19. Including de novo matching as a feature gives the score function a better discriminative power. false after before true score This is just one example of many other new features in PEAKS 5.3 for improving score function.

  20. … far better than what I could ever squeeze out of my data – Stefano Gotta, Siena Biotech “ ”

  21. PEAKS DB Workflow All Spectra De novo both helps to improve DB search, and reports novel peptides. DB search De Novo No Found? Yes DB peptides De novo only

  22. 3. Three essential examinations to ensure result quality.

  23. Don’t Trust Software Blindly! • Google “Don’t trust software blindly” returned 5,140,000 results. • As you quality control your experiments, quality control the software’s results too.

  24. Essential Examination 1 #decoy #target in low score region Low #decoy in high score region

  25. Essential Examination 2 Precursor error start to scatter below threshold High scoring peptides should have low precursor error.

  26. Essential Examination 3 • Spectrum annotation around score threshold.

  27. Take Home Message • Another year of dedicated work on PEAKS. • Ensured accuracy; maximized sensitivity. • Do the three essential examinations. • They are simple … at least in PEAKS.

  28. “a big step forward” – Christian Schmelzer, Martin Luther University Enjoy! http://www.bioinfor.com/peaks-download-a-pricing

More Related