230 likes | 359 Views
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis. Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center. Systems Biology. Structured High-Throughput Experiments. Knowledge Databases.
E N D
Generalized Protein Parsimony and Spectral Counting for FunctionalEnrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center
Systems Biology Structured High-ThroughputExperiments KnowledgeDatabases
Systems Biology molecular biology ↕phenotype molecular biology↕biology Structured High-ThroughputExperiments KnowledgeDatabases • Proteomics • Sequencing • Microarrays • Metabolomics • Localization • Function • Process • Interactions • Pathway • Mutation
Systems Biology molecular biology ↕phenotype molecular biology↕biology Structured High-ThroughputExperiments KnowledgeDatabases • Proteomics • Sequencing • Microarrays • Metabolomics • Localization • Function • Process • Interactions • Pathway • Mutation MathematicalModels
Systems Biology molecular biology ↕phenotype molecular biology↕biology Structured High-ThroughputExperiments KnowledgeDatabases FunctionalAnnotation Enrichment • Proteomics • Sequencing • Microarrays • Metabolomics • Localization • Function • Process • Interactions • Pathway • Mutation MathematicalModels
Systems Biology molecular biology ↕phenotype molecular biology↕biology Structured High-ThroughputExperiments KnowledgeDatabases FunctionalAnnotation Enrichment • Proteomics • Sequencing • Microarrays • Metabolomics • Localization • Function • Process • Interactions • Pathway • Mutation MathematicalModels
Systems Biology molecular biology ↕phenotype molecular biology↕biology Structured High-ThroughputExperiments KnowledgeDatabases FunctionalAnnotation Enrichment • Proteomics • Sequencing • Microarrays • Metabolomics • Localization • Function • Process • Interactions • Pathway • Mutation MathematicalModels
Functional Annotation Enrichment • In any draw, we expect: • ~ 5 "evens", ~ 2 "≤ 10", etc. • Each ball is equally likely • Balls are independent • p-value is surprise! • For transcriptomics: • Genes ↔ Balls • Genome ↔ Tumbler • Diff. Expr. ↔ Draw • Annotation ↔ "evens",… Draw 10 of 50!
Why not in proteomics? • Double counting and false positives… • …due to traditional protein inference • Proteomics cannot see all proteins… • …proteins are not equally likely to be drawn • Good relative abundance is hard… • …extra chemistries, workflows, and software • …missing values are particularly problematic
In proteomics… • Double counting and false positives… • Use generalized protein parsimony • Proteomics cannot see all proteins… • Use identified proteins as background • Good relative abundance is hard… • Model differential spectral counts directly
Ignore some PSMs • FDR filtering leaves some false PSMs • Enforce strict protein inference criteria • Leave some PSMs uncovered PSMs Proteins 10%
Ignore some PSMs • FDR filtering leaves some false PSMs • Enforce strict protein inference criteria • Leave some PSMs uncovered PSMs Proteins 90%
Plasma membrane enrichment • Pellicle enrichment of plasma membrane • Choksawangkarnet al. JPR 2013 (Fenselau Lab) • Six replicate LC-MS/MS analyses each • Cell-lysate (44,861 MS/MS) • Fe3O4-Al2O3 pellicle (21,871 MS/MS) • 625 3-unique proteins to match 10% FDR: • Lysate: 18,976 PSMs; Pellicle: 13,723 PSMs • 89 proteins with significantly (< 10-5) increased counts
Plasma membrane enrichment • Na/K+ ATPase subunit alpha-1 (P05023): • Lysate: 1; Pellicle: 90; p-value: 5.2 x 10-33 • Transferrin receptor protein 1 (P02786): • Lysate: 17; Pellicle: 63; p-value: 2.0 x 10-11 • DAVID Bioinformatics analysis (89/625): • Plasma membrane (GO:0005886) : 29 (5.2 x 10-5) • Transmembrane (SwissProtKW): 24 (1.3 x 10-6) • Transmembrane (SwissProtKW): • Lysate: 524; Pellicle: 1335; p-value: 2.6 x 10-158
Nascent polypeptide-associated complex subunit alpha 7.3 x 10-8
Pyruvate kinase isozymes M1/M2 2.5 x 10-5
Summary • Functional annotation enrichment for proteomics too: • Careful counting (generalized parsimony) • Differential abundance by spectral counts • Use (multivariate-)hypergeometric model for • Differential abundance by spectral counts • Proteoform detection
HER2/Neu Mouse Model of Breast Cancer • Paulovich, et al. JPR, 2007 • Study of normal and tumor mammary tissue by LC-MS/MS • 1.4 million MS/MS spectra • Peptide-spectrum assignments • Normal samples (Nn): 161,286 (49.7%) • Tumor samples (Nt): 163,068 (50.3%) • 4270 proteins identified in total • 2-unique generalized protein parsimony