10 likes | 98 Views
Georgetown University. Peptide Atlas A8_IP LTQ MS/MS Dataset Tryptic search of Human ESTs using PepSeqDB [2] 107084 spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge. 8685 search jobs, 25.7 days of total CPU time. 5211 TeraGrid TKO jobs in < 2 hours (143 nodes)
E N D
Georgetown University Peptide Atlas A8_IP LTQ MS/MS Dataset • Tryptic search of Human ESTs using PepSeqDB [2] • 107084 spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge • 8685 search jobs, 25.7 days of total CPU time. • 5211 TeraGrid TKO jobs in < 2 hours (143 nodes) • Total elapsed time (Mascot bottleneck): < 26 hours. Improving the Sensitivity of Peptide Identification from Tandem Mass Spectrausing Meta-Search, Grid-Computing, and Machine-Learning. Nathan J. Edwards, Georgetown University Medical Center Annual Meeting, 2009 Introduction Peptide Identification Meta-Search via Grid-Computing PeptideMapper Web-Service The PepArML meta-search engine provides: • A unified MS/MS search interface for Mascot, X!Tandem, KScore, OMSSA, andMyriMatch, • Search job scheduling on multiple large-scale heterogeneous computational grids, • Unsupervised, model-free result combining using machine-learning (PepArML [1]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. • Ad-hoc, one-click mapping of peptides to protein and transcript sequence evidence, and genomic loci. Meta-search with five search engines;Automatic target & decoy searches. X!Tandem, KScore, OMSSA, MyriMatch, Mascot (1 core). Tandem, KScore, OMSSA, MyriMatch. • Interactive, SOAP,HTTP → CSV,XML,BED Heterogeneous compute resources Secure communication Scales to 250+ simultaneoussearches Edwards Lab Scheduler & 48+ CPUs NSF TeraGrid 1000+ CPUs Free, instantregistration Simple search description Job management Result combining Unified MS/MS Search Interface • Automatic search engine configuration and execution, • parameterized by: • Instrument & proteolytic agent • Fixed and variable modifications • Protein sequence database & MS/MS spectra file • Peptide candidate selection PepArML – Unsupervised Machine-Learning Combiner Conclusions MS/MS Spectra Reformatting • Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. • The PepArML meta-search engine is publicly available, free of charge, on-line from: http://edwardslab.bmcb.georgetown.edu • Charge and precursor enumeration for peptide candidate selection (for charge & 13C peak correction) • Search engine formatting constraints (MGF/mzXML) • Consistent MS/MS spectrum identifier tracking • Spectrum file “chunking” Peptide Candidate Selection References • N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics 5.1 (2009). • N.J. Edwards. "Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression." Molecular Systems Biology 3.102 (2007). • Missed cleavages, specific or semi-specific proteolysis • Precursor matching parameters, including • Precursor mass tolerance & 13C peak correction • Charge state guessing and/or enumeration For five columns, line up guides with these boxes For three columns, line up guides with these boxes