1 / 1

For five columns, line up guides with these boxes

Georgetown University. Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. Nathan J. Edwards, Georgetown University Medical Center. Annual Meeting, 2009. Introduction.

corina
Download Presentation

For five columns, line up guides with these boxes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Georgetown University Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. Nathan J. Edwards, Georgetown University Medical Center Annual Meeting, 2009 Introduction Peptide Identification Meta-Search via Grid-Computing Real Data: Peptide Atlas – A8_IP The PepArML meta-search engine provides: • A unified MS/MS search interface for Mascot, X!Tandem, KScore, OMSSA, andMyriMatch, • Search job scheduling on multiple large-scale heterogeneous computational grids, • Unsupervised, model-free result combining using machine-learning (PepArML [1]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. Meta-search with five search engines;Automatic target & decoy searches. X!Tandem, KScore, OMSSA, MyriMatch, Mascot (1 core). Tandem, KScore, OMSSA. Heterogeneous compute resources Secure communication Scales to 250+ simultaneoussearches Edwards Lab Scheduler & 48+ CPUs NSF TeraGrid 1000+ CPUs Free, instantregistration Simple search description Job management Result combining Unified MS/MS Search Interface Peptide Atlas A8_IP LTQ MS/MS Dataset • Tryptic search of Human ESTs using PepSeqDB [2] • 107084 spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge • 8685 search jobs, 25.7 days of total CPU time. • 5211 TeraGrid TKO jobs in < 2 hours (143 machines) • Total elapsed time (Mascot bottleneck): < 26 hours. • Automatic search engine configuration and execution, • parameterized by: • Instrument & proteolytic agent • Fixed and variable modifications • Protein sequence database & MS/MS spectra file • Peptide candidate selection PepArML – Unsupervised Machine-Learning Combiner Q-TOF Conclusions MS/MS Spectra Reformatting • Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. • The PepArML meta-search engine is publicly available, free of charge, on-line from: http://edwardslab.bmcb.georgetown.edu • Charge and precursor enumeration for peptide candidate selection (for charge & 13C peak correction) • Search engine formatting constraints (MGF/mzXML) • Consistent MS/MS spectrum identifier tracking • Spectrum file “chunking” MALDI LTQ Peptide Candidate Selection References U*-TMO • N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics 5.1 (2009). • N.J. Edwards. "Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression." Molecular Systems Biology 3.102 (2007). • Missed cleavages, specific or semi-specific proteolysis • Precursor matching parameters, including • Precursor mass tolerance & 13C peak correction • Charge state guessing and/or enumeration C-TMO U-TMO Heuristic Legend: Tandem, Mascot, OMSSA: T, M, O; Mascot w/ Peptide Prophet: M*; Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C-TM, C-TO, C-MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO. For five columns, line up guides with these boxes For three columns, line up guides with these boxes

More Related