20 likes | 189 Views
Georgetown University. D 1. D 2. D 3. D 4. I 0. I 1. I 2. I 4. I 3. Begin. b 1. y 1. b 2. y 2. End. Improving the Sensitivity of Peptide Identification With Meta-Search and Machine Learning. Nathan J. Edwards 1 , Xue Wu 2 , Chau-Wen Tseng 2.
E N D
Georgetown University D1 D2 D3 D4 I0 I1 I2 I4 I3 Begin b1 y1 b2 y2 End Improving the Sensitivity of Peptide Identification With Meta-Search and Machine Learning Nathan J. Edwards1, Xue Wu2, Chau-Wen Tseng2 1Georgetown University Medical Center; 2University of Maryland, College Park Annual Meeting, 2008 Introduction Peptide Identification Meta-Search HMMatch Spectral Matching • We use a variety of techniques, from sequence enumeration and meta-search to machine learning to increase the number high-confidence peptide identifications from large tandem mass-spectra datasets. • These techniques seek to improve the number of peptide identifications made at a given level of statistical significance. • We show that these techniques can improve identification sensitivity significantly. Meta-search with four search engines;Target & decoy searches automatically. 11% 17% 6% 94% 8% 0% 11% 86% 17% 0% 6% 92% 19% Heterogeneous compute resources I0 I1 y1 I2 b2 I3 y2 I4 b3 I5 y3 I6 b1 Edwards Lab Scheduler & 48+ CPUs NSF TeraGrid 1000+ CPUs Scales to 100’s of simultaneous searches Free, instantregistration Simple search description Secure communication Peptide Sequence Databases UMIACS 250+ CPUs Web-service API for all data All peptide sequences from: • Six-frame translation of EST and HTC sequences; • Three-frame translation of mRNA sequences; • All IPI, RefSeq, Genbank, Vega, EMBL, HInvDB, SwissProt and TrEMBL proteins; • SwissProt variants, splices, conflicts, mature isoforms grouped by gene-cluster & compressed, as FASTA. PepArML - Unsupervised Machine-Learning Combiner Conclusions Q-TOF • Peptide sequence databases, meta-search engine, machine-learning combiner available from:http://edwardslab.bmcb.georgetown.edu • Application of enumeration, meta-search, and machine-learning can significantly improve the sensitivity of peptide identification. PepSeqDB Release 1.2 LTQ References MALDI • Edwards. Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression. Mol. Sys. Biol. 2007. • Wu, Tseng, Edwards. HMMatch: Peptide Identification by Spectral Matching of Tandem Mass Spectra using Hidden Markov Models. J. Comp. Biol.2007. • Wu, Tseng, Rudnick, Balgley, Edwards. PepArML: An Unsupervised, Model-Free, Combining, Peptide Identification Arbiter for Tandem Mass Spectra via Machine Learning. In preparation. U*-TMO U-TMO C-TMO Schedule: Automated rebuild every few months. Coming soon: Fast peptide to gene and source sequence mapping using suffix-trees and gene sequence-groups. H Legend: Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C-TM, C-TO, C-MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO. False Positive Rate Iteration For five columns, line up guides with these boxes For three columns, line up guides with these boxes