10 likes | 114 Views
Georgetown University. Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. Nathan J. Edwards, Georgetown University Medical Center. Annual Meeting, 2009. Introduction.
E N D
Georgetown University Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. Nathan J. Edwards, Georgetown University Medical Center Annual Meeting, 2009 Introduction Peptide Identification Meta-Search via Grid-Computing Real Data: Peptide Atlas – A8_IP The PepArML meta-search engine provides: • A unified MS/MS search interface for Mascot, X!Tandem, KScore, OMSSA, andMyriMatch, • Search job scheduling on multiple large-scale heterogeneous computational grids, • Unsupervised, model-free result combining using machine-learning (PepArML [1]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. Meta-search with five search engines;Automatic target & decoy searches. X!Tandem, KScore, OMSSA, MyriMatch, Mascot (1 core). Tandem, KScore, OMSSA. Heterogeneous compute resources Secure communication Scales to 250+ simultaneoussearches Edwards Lab Scheduler & 48+ CPUs NSF TeraGrid 1000+ CPUs Free, instantregistration Simple search description Job management Result combining Unified MS/MS Search Interface Peptide Atlas A8_IP LTQ MS/MS Dataset • Tryptic search of Human ESTs using PepSeqDB [2] • 107084 spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge • 8685 search jobs, 25.7 days of total CPU time. • 5211 TeraGrid TKO jobs in < 2 hours (143 machines) • Total elapsed time (Mascot bottleneck): < 26 hours. • Automatic search engine configuration and execution, • parameterized by: • Instrument & proteolytic agent • Fixed and variable modifications • Protein sequence database & MS/MS spectra file • Peptide candidate selection PepArML – Unsupervised Machine-Learning Combiner Q-TOF Conclusions MS/MS Spectra Reformatting • Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. • The PepArML meta-search engine is publicly available, free of charge, on-line from: http://edwardslab.bmcb.georgetown.edu • Charge and precursor enumeration for peptide candidate selection (for charge & 13C peak correction) • Search engine formatting constraints (MGF/mzXML) • Consistent MS/MS spectrum identifier tracking • Spectrum file “chunking” MALDI LTQ Peptide Candidate Selection References U*-TMO • N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics 5.1 (2009). • N.J. Edwards. "Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression." Molecular Systems Biology 3.102 (2007). • Missed cleavages, specific or semi-specific proteolysis • Precursor matching parameters, including • Precursor mass tolerance & 13C peak correction • Charge state guessing and/or enumeration C-TMO U-TMO Heuristic Legend: Tandem, Mascot, OMSSA: T, M, O; Mascot w/ Peptide Prophet: M*; Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C-TM, C-TO, C-MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO. For five columns, line up guides with these boxes For three columns, line up guides with these boxes