Top-down characterization of proteins in bacteria with unsequenced genomes

Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center

Microorganism Identification • Homeland-security/defense applications • Long history of fingerprinting approaches • Clinical applications in strain identification: • Selection of treatment and/or antibiotics • New applications in microbiome analysis: • Bacterial colonies in gut, .... • Chronic wound infections • Compete with genomic approaches? • PCR, Next-gen sequencing • Primary sales-pitch is speed.

Microorganism Identifications • Match spectra with proteome (or genome) sequence for (species) identity • Provides robust match with respect to instrumentation and sample prep • Many bacteria will never be sequenced or "finished"... • Pathogen simulants, for example • ...but many have – about 2500 to date.

Microorganism Identifications • Match spectra with proteome (or genome) sequence for (species) identity • Provides robust match with respect to instrumentation and sample prep • Many bacteria will never be sequenced or "finished"... • Pathogen simulants, for example • ...but many have – about 2500 to date. • Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? • Yes, for some proteins in some organisms!

Crude cell lysate Capilary HPLC C8 column LTQ-Orbitrap XL Precursor scan: 30,000 @ 400 m/z Data-dependent precursor selection: 5 most abundant ions 10 second dynamic exclusion Charge-state +3 or greater CAD product ion scan 15,000 @ 400 m/z Intact protein LC-MS/MS

CID Protein Fragmentation Spectrum from Y. rohdei

Enterobacteriaceae Protein Sequences • Exhaustive set of all Enterobacteriaceae family protein sequences from • Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR] • ...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes • Primary and alternative translation start-sites • Filter for intact mass in range 1 kDa – 20 kDa • 253,626 distinct protein sequences, 256 species • Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.

ProSightPC 2.0 • Product ion scan decharging • Enabled by high-resolution fragment ion measurements • THRASH algorithm implementation • Absolute mass search mode • 15 ppm fragment ion match tolerance • 250 Da precursor ion match tolerance • "Single-click" analysis of entire LC-MS/MS datafile.

Other tools • Explored using standard search engines: • Decharge and format as charge +1 spectrum • X!Tandem scoring plugin (ProSight, delta M) • OMSSA, Mascot, etc… • MS-Tools: • MS-Deconv, MS-TopDown, • MS-Align, MS-Align+, MS-Align-E!

CID Protein Fragmentation Spectrum from Y. rohdei Match to Y. pestis 50S Ribosomal Protein L32

Exact match sequence…

Phylogeny: Protein vs DNA Protein Sequence 16S-rRNA Sequence

What about mixtures?

Shared Small Ribosomal Proteins

Identified E. herbicola proteins • 30S Ribosomal Protein S19 • m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007 • Six proteins identified with |Δ| < 0.02

Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 7.5e-26, Δ-14.128 • Eight proteins identified with "large" |Δ|

Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 1.91e-58 • Use "Sequence Gazer" to find mass shift • ΔM mode can "tolerate" one shift for free!

ProSightPC: ΔM mode ExperimentalPrecursor b- and y-ions ΔM Protein Sequence Also: PIITA - Tsai et al. 2009

ProSightPC: ΔM mode Match a single "blind" mass-shift for free! b'- and y'-ions ExperimentalPrecursor b- and y-ions ΔM ΔM Protein Sequence Also: PIITA - Tsai et al. 2009

ProSightPC: ΔM mode Match a single "blind" mass-shift for free! ExperimentalPrecursor b-, b'-, y- and y'-ions ΔM ΔM Protein Sequence Also: PIITA - Tsai et al. 2009

Identified E. herbicola proteins • DNA-binding protein HU-alpha • m/z 732.71, z 13+, E-value 7.5e-26, Δ-14.128 • Extract N- and C-terminus sequence supported by at least 3 b- or y-ions

E. herbicola protein sequences

E. herbicola sequences found in other species

Phylogenetic placement of E. herbicola Cladogram Phylogram phylogeny.fr – "One-Click"

Genome annotation errors • UniProt: E. coli Cell division protein ZapB • 22 (371) E. coli strains MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)

Genome annotation errors • UniProt: E. coli Cell division protein ZapB • 22 (371) E. coli strains • Need ±1500 Da precursor tolerance… MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)

Conclusions • Protein identification for unsequenced organisms. • Identification and localization for sequence mutations and post-translational modifications. • Extraction of confidently established sequence suitable for phylogenetic analysis. • Genome annotation correction. • New paradigm for phylogenetic analysis?

Acknowledgements • Dr. Catherine Fenselau • Avantika Dhabaria, Joe Cannon*, Colin Wynne* • University of Maryland Biochemistry • Dr. Yan Wang • University of Maryland Proteomics Core • Dr. Art Delcher • University of Maryland CBCB • Funding: NIH/NCI

Top-down characterization of proteins in bacteria with unsequenced genomes

Top-down characterization of proteins in bacteria with unsequenced genomes

Presentation Transcript

Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea

Isolation and Characterization of Mesophilic Luminescent Bacteria

Biochemical Characterization of Bacteria

Biochemical Characterization of Bacteria

TOP-DOWN !

PURIFICATION AND CHARACTERIZATION OF PROTEINS

Isolation and Characterization of Manganese Oxidizing Bacteria

Evolution of Proteins and Genomes

Tracking down ncRNAs in the genomes

Top-down characterization of proteins in bacteria with unsequenced genomes

Characterization of hypothetical proteins using protein features

Top-Down Design with Functions

Making Human (eukaryote) proteins in Bacteria (prokaryote)

Characterization and prediction of drug binding sites in proteins

Top-Down Design with Functions

Evolution of Proteins and Genomes select subset of slides

Identification and Characterization of Metal Ions in Proteins

Characterization and identification of bacteria

Characterization and identification of bacteria

Isolation and Characterization of Mesophilic Luminescent Bacteria