1.09k likes | 1.98k Views
Protein Engineering and Directed Evolution. May 24, 2011. Protein Engineering. A linear combination of 20 amino acids. Protein sequence dictated by gene sequence. Generate variants through alteration of DNA sequence. Sequence. Structural information can be a guide.
E N D
Protein Engineering and Directed Evolution May 24, 2011
Protein Engineering A linear combination of 20 amino acids. Protein sequence dictated by gene sequence Generate variants through alteration of DNA sequence Sequence Structural information can be a guide The protein structure is dictated by its linear sequence Structure Optimize functions of proteins/enzymes Protein function is dictated by its three dimensional structure Function
Enzymes in Food Processing Starch Processing Increasing the quality of beer The enzymes alpha-amylase, glucoamylase and glucose isomerase convert starch to high fructose corn syrup (HFCS). Alpha-amylase is used to liquefy starch slurry so that the starch is solubilized and readied for the next steps. Alpha-amylase splits the large amylose and amylopectin molecules that make up the starch into soluble dextrin fragments. www.genencor.com
What is protein engineering? • Optimize the properties of enzymes/proteins through changes in protein sequence. Asking enzymes to function more efficiently, in harsh conditions and last longer, etc. • Industrial applications • Chemical applications • Agriculture applications • Pharmaceutical applications • Many more • Introduce new properties into enzyme. Go beyond the biological context. • Perform catalysis under completely foreign conditions • Catalyze what is not observed in nature • De novo design of enzyme function (difficult, although progressing)
Chemical Synthesis Bioremediation Chemical Sensors Pharmaceuticals Metabolic Control GOOD Properties: Biological Functions: • Catalyze many classes of reactions • High specificity / selectivity • Low energy input • Catalysts • Immune Response • Control Units • Structural Scaffolds BAD Properties: • Low activities for non-natural reactions • Marginally stable • Industrial conditions • Low expression in heterologous hosts Which properties can we improve? Engineer
Enzymes are inherent designable Only “optimized” in the context of their biological systems Naturally evolvable and evolved. Divergent evolution is nature’s way of generating diversity Nature evolve new sequences, behaviors and biological functions These enzymes also function using the same catalytic residues (serine, histidine and aspartic acid). However, they catalyze the cleavage of different substrates (divergence) P1 (Cleavage site) Large Small Positive e.g. Tryptophan Alanine Lysine Evolution--mutation, recombination, and natural selection--has generated a fantastic array of functional molecules
We can use the evolution algorithm to create “new” enzymes Evolutionary approaches are generally more powerful as long as a suitable search strategy is available. Cirino
We can design enzymes to accommodate our needs: - novel specificity / activity / stability New functions can be achieved if: • It is physically possible • It is evolutionarily feasible (a path of functional enzymes exists in sequence space) • We can generate genetic diversity • We can select / screen for improvements Cirino
Sequence Space 2 residues 2 amino acids 3 residues 2 amino acids 500 residues 20 amino acids Huge: 20500 = 10650 sequences Highly dimensional: 19500 ~ 104 dimensions Cirino
The Fitness Landscape Finish point local maxima Fitness starting point {Sequence Space} Fitness Landscape The mapping from genotype (target sequence) to phenotype (fitness; as measured in the experiment). Directed evolution is an optimization on the fitness landscape. Arnold, Nat. Mol Cell Bio. 2009 Evolution is a random walk on a fitness landscape in sequence space, survival of the fittest Cirino
Ruggedness in Proteins Mutations can be beneficiary and can also be deleterious Two single mutations may each be deleterious But the combination of two may be beneficiary The Red and Green Residues are Interacting Intermediate Mutant A Improved Mutant AB Fitness Wild-type Intermediate Mutant B Fitness Fitness Cirino
What can directed evolution do? Examples Example - Herbicide tolerance: more than 75% of genetically modified plants are engineered for herbicide tolerance Glyphosate: Very effective herbicide -Toxic towards most crops -Decrease crop yield Acetyl-Glyphosate: Not herbicidal
A very robust enzyme applications in transgenic plant development
GFP Can be evolved to Other FPs Tsien, Annu Rev. Biochem 1998
Red Fluorescent Protein (RFP) Isolated from nonbioluminescent reef corals Tsien, Nature Biotechnology, 2000
Classic mutagenesis Chemical mutagenesis with mutagens. Works mostly on the whole cell level. Deamination with nitrous acid C U pairs with A instead of G A H pairs with C instead of T Alkylation with EMS or nitrosoguanidine G 6Eq pairs with T instead of C
Classic mutagenesis by radiation UV crosslinks two neighboringpyrimidine bases. Errors andmutations are introduced duringDNA repair by host enzymes.
Directed Mutagenesis/Evolution Strategies • Most mutagenesis strategies rely on PCR. • Most times knowing the gene sequence is essential. • Site-directed mutagenesis (point mutation). • Introduce specific mutation at a specified location in the gene • Random mutagenesis . • Introduce random mutations at a specified position or throughout the gene of interest • DNA shuffling. • Shuffle mutants of the same gene to achieve diversity • DNA family shuffling. • Shuffling homologous genes from different species to explore large sequence space • Genome Shuffling. • Shuffling genomes through homologous recombination.
Site-directed mutations • To probe the importance of a specific amino acid in a protein sequence. • Is the amino acid involved in catalysis? • Does the amino acid dictate specificity? • Is the amino acid essential for protein function? • If the importance of the amino acid is known (from crystal structure, biochemical analysis) • Mutate the amino acid to enhance enzyme properties • Alter the size of the amino acid to tighten/loosen enzyme substrate specificity
Subtilisin stability can be improved by point mutations • What is subtilisin: • A serine protease from Bacillus bacteria • Broadly specific for proteins that commonly soil cloth • Used widely as the “enzymatic additive” in commercial laundry detergents • Wild type subtilisin can be easily inactived • In the presence of bleach, the protein becomes inactive very quickly (~90% inactivation) • The inactivation is due to oxidation of the methionine at position 222 (M222)
Second Generation Subtilisin • M222 was systematically mutated to each of 19 other amino acids and the stability of the mutant enzymes were investigated (Genencor). (Estell, JBC, 1985) 1M H2O2 percent enzyme activity Time (min)
Site directed mutagenesis is limited • Site directed mutagenesis is limited in its scope. • Difficult to predict which substitution can be beneficiary. • More than one residue can contribute to enzyme activity and stability. • “Key” residues unknown. • The availability of the crystal structure helps, but does not allow a reliable prediction of what/where the mutations should be. • Protein engineers therefore need to generate all possible amino acid changes at one, or a combination of residues. • How do we modify the PCR-based mutagenesis procedures to • 1) all possible mutations at a single position? • 2) introduce multiple mutations
Degenerate Oligonucleotides 25% A 25% C 25% G 25% T 100% 100% 100% 100% N To make a degenerate primer,A mixed nucleotide pool is used in additional to the four pure pools.During DNA synthesis, N canbe added to the oligonucleotideinstead of one of the four purenucleotides. During normal primer synthesis,the desired nucleotide is addedat to the growing oligonucleotide.Each nucleotide pool is 100% pure. 5’ ACG GTC GAT GTA NNN GGG CCC AAC 3’ 5’ ACG GTC GAT GTA CCA GGG CCC AAC 3’ 64 possible combinationscovering all 20 amino acidsincluding stop codons.
Saturation mutagenesis example These authors found 5 residuesthat interact with the substrate directly. Saturation mutagenesis were performedsimultaneously at all five positions. Library size: 20X20X20X20X20 3.2 million possible combinations The desired mutant contained mutationsin four of the five residues. The newenzyme property cannot be achieved with single residue mutations. M. jannaschii TyrRS bound to tyrosine Wang and Schultz, 2003
Error-prone PCR: Random Mutagenesis • Altering the PCR conditions to make it prone to errors during amplification random incorporation of substitutions. • Normal PCR reaction: MgCl2, 0.2mM dNTPs, template DNA, primers, DNA polymerase, thermal cycling (95C, 55C, 72C) • Taq polymerase error rate: 2 X 10-4 • pfu polymerase error rate: 7 X 10-7 • Error-prone PCR conditions which INCREASE error rates of Taq polymerase and accumulate mutations • Staggered dNTP concentration (0.2 mM dATP & dGTP, 1.0 mM dCTP & dTTP) • Addition of MnCl2 (affects Taq error rate) • Increase the number of PCR cycle • Increase the length of molecule to be amplified Cirino
Library creation NdeI BamHI GOI pET28 expression plasmid Add primers contain restriction sites. Cut PCR product with NdeI and BamHI, Purify insert library Error-prone PCR Mutation and Amplification. Library of mutant expressionplasmids Screen for desired properties
Error Prone PCR – Subtilisin Example • Goal: To have subtilisin function in a nonaqueous solvent. • Unlike the previous example, this property cannot be predicted and one has no idea where to start. • Solution: error prone PCR. 10 successive rounds of mutagenesis were performed. In each round, the improved mutant was selected. The gene encoding the mutant serves as template for the next round of error prone PCR. log scale change in activity Chen and Arnold, PNAS, 1993, p5618
Aiming for great sequence diversity • If the fitness landscape is rugged, point mutations alone are likely to lead to local optima. Point mutations are too gradual to allow the block changes that are required for continued sequence evolution The Fitness Landscape Fitness {Sequence Space}
DNA shuffling recombines different mutants to allow greater sequence space exploration Cirino
DNA shuffling recombines mutants • DNA Recombination allows us to look at a larger portion of sequence space (compared to what point mutagenesis allows). • Those sequences which are being explored are already “solutions” (i.e., the sequences already correspond to fold and function, at least in another protein) reduction in search space • Combines additive mutations and removes deleterious mutations (e.g., after several rounds of error-prone PCR) • More likely to result in “new” functions (compared to accumulating single point mutations)
DNA Shuffling DNase I digestion 1. 1. Digest PCR products of homologous genes. Create pool of ssDNA fragments (short single strand DNA). 2. Perform “primerless” PCR to reassemble genes. 3. Cut and clone reassembled genes for expression.
Genetic recombination assay Wildtype Lac Zα on pUC 18 plasmid Transform E. coli in presence of X-Gal stop codons 75 b.p. Transform E. coli in presence of X-Gal Lac Zα Mutants Stemmer, W. P. PNAS Vol. 91 pp. 10747-10751 1994
Genetic recombination assay (cont.) Mutant 1 white white Recombine mutant genes white Mutant 2 blue Transform E. coli in presence of X-Gal. Count blue and white colonies to measure recombination frequency. Ratio of active recombinant colonies after assembling 50-100bp fragments was 24% (n=386)
Negative mutations are suppressed Starting mutants may have both positive and negative mutations. The net change of the mutant maybe positive negative mutationsmasked DNA shuffling generates all possible combination of pointmutants large library Backcrosses with wild type region can remove negative mutations. Recombinants with largenumber of negative mutationsare eliminated from the next round of DNA shuffling. Positive mutants are selected to go to the next round of shuffling.
Error prone PCR and Shuffling together are powerful protein engineering techniques Further shuffling random mutagenesis
Family shuffling Key: the starting genes are already nature’s solutions after natural evolution. They contain functional domains.
Example of family shuffling Goal: Increase the activity of cephalosporinase towards moxalactam (an antibiotic) 1. Select four related cephalosporinase from different species • 2. Generate point mutants of each gene and shuffle the mutants of each gene separately (8 fold improvement in activity for each cephalosporinase. • Combine all the mutants from all four genes and perform family shuffling. • The best mutants from family shuffling were 270-540 fold more active. Nature, 391, 1998, p288
Genome Shuffling of Antibiotic Producing Streptomyces Strains • Streptomyces are important industrial organisms for the production of antibiotics, anticancer drugs and other small molecule pharmaceutical compounds • Examples: Tetracyclines, erythromycin, daunorubicin, mithramycin, lovastatin (Zocor) • Streptomyces are soil borne, gram-positive bacteria that live under unfavorable conditions (starvation, among a population of other bacteria) • The antibiotics are produced as secondary metabolites, mostly for self-defense.
Classic Mutagenesis is often used to find high-producing mutant strains • How do we find a mutant strain of Streptomyces fradiae that produces higher amounts of antibiotic tylosin (Eli Lilly)? • The directed evolution of microorganisms have traditionally been through the asexual process of classical strain improvement (CSI): sequential random mutagenesis and screening. • The sequential mutagenesis are performed using mutagens and UV radiation. • Most of times, the nature of the mutation is not important. (Black box approach) • Although CSI is the method of choice in pharmaceutical companies, the process is inefficient and usually take decades and $$$$ to isolate a significantly improved mutant.
CSI vs. Genome shuffling In CSI, during one round of mutagenesis, a large number of mutants can be recovered. Usually, only the best performing mutant strain will be selected and be subjected to additional mutagenesis. Genome shuffling takes all the mutants that show improvement over parent strain and shuffle the genomes together to generate combinations of mutations (mimicking the natural evolution of species). This process is analogous to DNA shuffling, but on a much more grand scale (genomes vs.genes). Maxygen, Nature, 2002
How is genome shuffling possible? • Combine the cellular contents of several mutant strains through protoplast fusion. • During protoplast fusion, homologous recombination between homologous chromosomal regions will take place, allowing mutations to be passed from one strain to another. • Fused protoplasts can be regenerated into single cells carrying shuffled genomes.
Screening / Selecting Improved Variants • (generally considered the hard part) • Key Point: You get what you screen for! And other properties or functions not selected for may be lost. • Some Concerns: • How well does your screen reflect your desired function? • Sensitivity of the screen (what is the background – how well can you identify small improvements?) • Screening capabilities / sampling of library / library size • Equipment requirements (robotics, cell sorter, imaging)