1 / 15

False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony

False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony. Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center. Peptide-Spectrum Matches.

faunia
Download Presentation

False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

  2. Peptide-Spectrum Matches • Sigma49 – 32,691 LTQ MS/MS spectra of 49 human protein standards; IPI Human • Yeast – 162,420 LTQ MS/MS spectra from a yeast cell lysate; SGD. • X!TandemE-value (no refinement), 1% FDR Spectra used in: Zhang, B.;  Chambers, M. C.;  Tabb, D. L. 2007.

  3. Traditional Protein Parsimony • Select the smallest set of proteins that explain all identified peptides. • Sensible principle, implies • Eliminate equivalent/subset proteins • Equivalent proteins are problematic: • Which one to choose? • Unique-protein peptides force the inclusion of proteins into solution • True for most tools, even probability based ones • Bad consequences for FDR filtered ids

  4. Many proteins are easy • Eliminate equivalent / dominated proteins • Sigma49: 277 → 60 proteins • Yeast: 1226 → 1085 proteins • Many components have a single protein: • Sigma49: 52 ( 3 multi-protein) • Yeast: 994 (43 multi-protein) • "Unique" peptides force protein inclusion • Sigma49: 16 single-peptide proteins • Yeast: 476 single-peptide proteins

  5. Must eliminate redundancy • Contained proteins should not be selected 37 distinct peptides

  6. Must eliminate redundancy 1.0 1.0 0.8 0.7 0.0 1.0 • Contained proteins should not be selected • Even if they have some probability mass • Number of sibling peptides matter less if they are shared. Single AA Difference

  7. Must ignore some PSMs 1.0 0.0 0.0 0.0 0.0 1.0 • A single additionalpeptideshould not force protein into solution Single AA Difference

  8. Example from Yeast • "Inosinemonophosphate dehydrogenase" • 4 gene family • Contained proteins should not be selected • Single peptide evidence for YML056C 1.0 0.6 0.0 1.0

  9. Must ignore some PSMs • Improving peptide identification sensitivitymakes things worse! • False PSMs don't cluster PSMs PSMs 2x Proteins 10%

  10. Must ignore some PSMs • Improving peptide identification sensitivitymakes things worse! • False PSMs don't cluster PSMs PSMs Select Proteins to Explain True PSM% 90% 90%

  11. Must ignore some PSMs • How do we choose? • Maximize # peptides? • Minimize FDR (naïve model)? • Maximize # PSMs?

  12. Generalized Protein Parsimony • Weight peptides by number of PSMs • Constrainunique peptides per protein • Maximize explained peptides (PSMs) • Match PSM filtering FDR to % uncovered PSMs • Readily solved by branch-and-bound • Permits complex protein/peptide constraints • Reduces to traditional protein parsimony

  13. Match FDR to uncovered PSMs Traditional Parsimony at 1% FDR: 1085 (609 2+-Unique) Proteins

  14. Software • Filter multi-acquisition identifications by: • FDR, E-value, probability • Rewrite PSMs to reflect parsimony analysis • PepXML, CSV, Excel • Component-wise Peptide-Protein matrix: • Selected, Dominant, Equivalent, Contained • Selected protein accessions: • …plus equivalents

  15. Conclusions • Many components are clear • Doesn't matter what technique is used • Traditional techniques do not handle the second protein in a component well • A single additional peptide should not force • Explain only the true PSM %: • Determine protein criteria first • Adjust PSM filter until explained peptides match

More Related