290 likes | 403 Views
Top-down characterization of proteins in bacteria with unsequenced genomes. Nathan Edwards Georgetown University Medical Center. Microorganism Identification. Homeland-security/defense applications Long history of fingerprinting approaches Clinical applications in strain identification:
E N D
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center
Microorganism Identification • Homeland-security/defense applications • Long history of fingerprinting approaches • Clinical applications in strain identification: • Selection of treatment and/or antibiotics • New applications in microbiome analysis: • Bacterial colonies in gut, .... • Chronic wound infections • Compete with genomic approaches? • PCR, Next-gen sequencing • Primary sales-pitch is speed.
Microorganism Identifications • Match spectra with proteome (or genome) sequence for (species) identity • Provides robust match with respect to instrumentation and sample prep • Many bacteria will never be sequenced or "finished"... • Pathogen simulants, for example • ...but many have – about 2500 to date.
Microorganism Identifications • Match spectra with proteome (or genome) sequence for (species) identity • Provides robust match with respect to instrumentation and sample prep • Many bacteria will never be sequenced or "finished"... • Pathogen simulants, for example • ...but many have – about 2500 to date. • Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? • Yes, for some proteins in some organisms!
Crude cell lysate Capilary HPLC C8 column LTQ-Orbitrap XL Precursor scan: 30,000 @ 400 m/z Data-dependent precursor selection: 5 most abundant ions 10 second dynamic exclusion Charge-state +3 or greater CAD product ion scan 15,000 @ 400 m/z Intact protein LC-MS/MS
Enterobacteriaceae Protein Sequences • Exhaustive set of all Enterobacteriaceae family protein sequences from • Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR] • ...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes • Primary and alternative translation start-sites • Filter for intact mass in range 1 kDa – 20 kDa • 253,626 distinct protein sequences, 256 species • Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.
ProSightPC 2.0 • Product ion scan decharging • Enabled by high-resolution fragment ion measurements • THRASH algorithm implementation • Absolute mass search mode • 15 ppm fragment ion match tolerance • 250 Da precursor ion match tolerance • "Single-click" analysis of entire LC-MS/MS datafile.
Other tools • Explored using standard search engines: • Decharge and format as charge +1 spectrum • X!Tandem scoring plugin (ProSight, delta M) • OMSSA, Mascot, etc… • MS-Tools: • MS-Deconv, MS-TopDown, • MS-Align, MS-Align+, MS-Align-E!
CID Protein Fragmentation Spectrum from Y. rohdei Match to Y. pestis 50S Ribosomal Protein L32
Phylogeny: Protein vs DNA Protein Sequence 16S-rRNA Sequence
Identified E. herbicola proteins • 30S Ribosomal Protein S19 • m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007 • Six proteins identified with |Δ| < 0.02
Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 7.5e-26, Δ-14.128 • Eight proteins identified with "large" |Δ|
Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 1.91e-58 • Use "Sequence Gazer" to find mass shift • ΔM mode can "tolerate" one shift for free!
ProSightPC: ΔM mode ExperimentalPrecursor b- and y-ions ΔM Protein Sequence Also: PIITA - Tsai et al. 2009
ProSightPC: ΔM mode Match a single "blind" mass-shift for free! b'- and y'-ions ExperimentalPrecursor b- and y-ions ΔM ΔM Protein Sequence Also: PIITA - Tsai et al. 2009
ProSightPC: ΔM mode Match a single "blind" mass-shift for free! ExperimentalPrecursor b-, b'-, y- and y'-ions ΔM ΔM Protein Sequence Also: PIITA - Tsai et al. 2009
Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 7.5e-26, Δ-14.128 • Extract N- and C-terminus sequence supported by at least 3 b- or y-ions
Phylogenetic placement of E. herbicola Cladogram Phylogram phylogeny.fr – "One-Click"
Genome annotation errors • UniProt: E. coli Cell division protein ZapB • 22 (371) E. coli strains MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)
Genome annotation errors • UniProt: E. coli Cell division protein ZapB • 22 (371) E. coli strains • Need ±1500 Da precursor tolerance… MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)
Conclusions • Protein identification for unsequenced organisms. • Identification and localization for sequence mutations and post-translational modifications. • Extraction of confidently established sequence suitable for phylogenetic analysis. • Genome annotation correction. • New paradigm for phylogenetic analysis?
Acknowledgements • Dr. Catherine Fenselau • Avantika Dhabaria, Joe Cannon*, Colin Wynne* • University of Maryland Biochemistry • Dr. Yan Wang • University of Maryland Proteomics Core • Dr. Art Delcher • University of Maryland CBCB • Funding: NIH/NCI