Structural Genomics of Pathogenic Protozoa

Structural Genomics of Pathogenic Protozoa Protein Production and Crystallization Workshop 2004 Christopher Mehlin cmehlin@u.washington.edu WWW.SGPP.ORG

The SGPP is focused on protozoa which cause human disease • Malaria – Plasmodium falciparum, P. vivax • Leishmaniasis – Leishmania major + 8 others • African sleeping sickness – Trypanosoma brucei • Chagas’ disease – Trypanosoma cruzi These diseases afflict ~500 million people per year; roughly half the world’s population is at risk.

These targets are challenging! • Eukaryotic organisms • Leishmania • Only L. major sequence is known (more coming…) • Plasmodium falciparum • 80% AT-rich genome • Requires cDNA – intron prediction difficult • Floppy loops e.g. CDK-2 has 83 asparagines in a row

Primers-to-Protein Normally ~5% Overall Yield Data from 1318 L. major and 368 P. falciparum targets L. major 5.2% P. falciparum 4.9% >85% of our effort is put into cloning, screening, and expressing this 5%

Protein Variants Increase the Odds • Multiple species variants • Especially Leishmania • “Chunking” • Computational domain prediction • Random truncation

Primers designed for L. major can fish out homologues from other species Homology L. major L. aethiopica L. infantum L. donovani L. tropica L. mexicana L. guyanensis L. naiffi L. braziliensis L. tarentolae E. scheideri 97% Human pathogens 60%

Primers designed for L. major can fish out homologues from other species PCR success using L. major primers Homology L. major L. aethiopica L. infantum L. donovani L. tropica L. mexicana L. guyanensis L. naiffi L. braziliensis L. tarentolae E. scheideri 83% 97% 10% 60%

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) Target Number Organism1 2 3 4 5 6 7 P. falciparum4 L. major 4 HOMOLOGUES Two species gave us eight proteins and 7/40 (18%) of the targets.

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) Target Number Organism1 2 3 4 5 6 7 8 9 10 P. falciparum4 L. major 4 L. infantum 3 95% IDENTICAL No overlap! Small changes in sequence make an enormous difference in the behavior of the protein.

Multiple species targeted with a list of 40 high-value targets (enzymes with known inhibitors) Target Number Organism1 2 3 4 5 6 7 8 9 10 11 12 13 14 P. falciparum4 L. major 4 L. infantum 3 L. mexicana 3 L. guyanensis 2 L. braziliensis 2 L. tarentole 1 TOTAL: 19 proteins, 14 of 40 (35%) of targets 10 targets would not have been obtained otherwise

Multiple species variants help crystallization, too! 1 60 Lmaj001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHELVPENTKYIVTEHY Ldon001686 MSRLMPHYSKGKTAFLCVDLQEAFSKRIENFANCVFVANRLARLHEVVPENTKYIVTEHY 61 120 Lmaj001686 PKGLGRIVPGITLPQTAHLIEKTRFSCIVPQVEELLEDVDNAVVFGIEGHACILQTVADL Ldon001686 PKGLGRIVPEITLPKTAHLIEKTRFSCVVPQVEELLEDVDNAVVFGIEGHACILQTVADL 121 180 Lmaj001686 LDMNERVFLPKDGLGSQKKTDFKAAMKLMGSWSPNCEITTSESILLQMTKDAMDPDFKKI Ldon001686 LDMNKRVFLPKDGLGSQKKTDFKAAIKLMSSWGPNCEITTSESILLQMTKDAMDPNFKRI 181 193 Lmaj001686 SKLLKEEPPIPL. Ldon001686 SKLLKEEPPIPL. 95% IDENTITY Lmaj001686AAA nice crystals, no diffraction Ldon001686AAA “huge” crystals, 2.7Å diffraction

The concept of chunking… N(N+1) 2 Consider a 3 - domain protein: Standard chunks would be the entire protein, each individual domain, and any contiguous series of domains. A 3 domain protein therefore becomes 6 chunks. Full length Adjacent domains Single domains

Domain Parsing using GINZU David Kim, UW Target Sequence Step 1: PSI-Blast against the PDB PDB Step 2: Use consensus fold recognition methods to find remote PDB matches Fold Recognition PDB Step 3: Search PFAM database for preassigned modular “chunks” Fold Recognition Pfam PDB Step 4: Identify new modular “chunk” regions in multiple sequence alignment Fold Recognition Pfam PDB MSA Step 5: Identify parse points in Rosetta structure predictions Confidence Fold Recognition Pfam PDB MSA Rosetta Rosetta Final Step: Select cut points in linker regions using assigned boundaries and coil predictions Rosetta Rosetta Fold Recognition Pfam PDB MSA Chunk Generation

Pfal006650AAA Example - tRNA Synthetase PFAM, PDB, and MSA coverage David Kim, UW Ginzu Domains No assignment but still based on MSA (remaining region) PFAM hit to PF01411 tRNA synthetases class II (A) PDB hit to 1nyqA (Threonyl-tRNA Synthetase) MSA based assignment Ginzu Parse Results w/ Multiple Sequence Alignment PSI-BLAST against Non-redundent (NR) sequence database Remaining Region PDB MSA PFAM

CHUNKING L. major PROTEINS GINZU 71 ORFs 205 Chunks (not counting full length) 15 chunks solubly expressed (7%) 11 ORFs had 1 soluble chunk 2 ORFs had 2 chunks soluble 5 ORFs solubly expressed (7%) 1 chunk of non-crystallizing, soluble ORF crystallized 2/16 chunks of soluble ORFs soluble (both of the same ORF) 12/66 inaccessible proteins have had at least one solublechunk (18%) 17/71 proteins accessible via this technique (24%)

Superchunking: for high-value targets Step 1: Determine functional domain of protein by comparison to known protein: Functional Domain Step 2: Determine 10 truncation sites on each side of functional domain; Make 20 primers. Functional Domain Step 3: Run 10x10=100 PCRs, clone products, screen for soluble expression, crystallizability

SuperchunkingThioredoxin Reductase from P. falciparum ►TR is a 60.7 kDa enzyme with a high degree of domain interaction ►PCR success 100% -- used template of full-length PCR ►20 different soluble proteins from 90 cloned constructs. Erica Boni

Superchunking Thioredoxin Reductase NATIVE Erica Boni

Superchunking Thioredoxin Reductase 18 off N-terminus & 8 off C-terminus 16 off C-terminus Erica Boni 7 off N-terminus

Conclusions: Relatively small changes in protein sequence can have dramatic effects on the behavior of proteins in expression and crystallization. Multiple species and chunking are two promising methods for obtaining protein variants.

Acknowledgements: • University of Washington • Jamie Andreyka, Erica Boni, Tiffany Feist, Lutfiyah Haji, Colleen Liu, Natascha Mueller • Fred Buckner, Mike Gelb, Wes VanVoohris, Kevin Bauer • David Baker, David Kim, Erkang Fan, Stan Fields Group • Wim Hol and Hol group • Seattle Biomedical Research Institute • Liz Worthey, Ellen Sisk, Peter Myler • Hauptman Woodward Medical Research Institute • George Detitta, Joe Luft, Nancy Fehrman, Angela Luricella et al. • Seattle Crystallization and Structure Determination Units • Oleksandr Kalyuzhniy, Lori Anderson • Ethan Merritt, Isolde Le Trong, Mark Robien • Collaborators: • SSRL Stanford • ALS Berkeley NIH/NIGMS/NIAID

Structural Genomics of Pathogenic Protozoa