230 likes | 340 Views
Protein Design. Crystal structure of top7 – A novel protein structure created with RosettaDesign. CS273: Final Project Charles Kou charlesk@stanford.edu. http://rosettadesign.med.unc.edu/. What is Protein Design.
E N D
Protein Design Crystal structure of top7 – A novel protein structure created with RosettaDesign. CS273: Final Project Charles Kou charlesk@stanford.edu http://rosettadesign.med.unc.edu/
What is Protein Design • Opposite of structure prediction: determine low energy sequence that yield given structure. • Computationally difficult: • Search space of 20^n where n = sequence length (20 amino acids) • Major algorithms: Dead-end elimination, genetic algorithms, Monte Carlo, Branch & Bound. http://www.stanford.edu/class/cs273/project/project.html
Major Algorithms • Trade off between thoroughness and computational speed. • Monte Carlo / Genetic Algorithm: • Can sample space with infinite number of solutions • Sidechain identity, side chain orientation and backbone structure can be varied continuously. • No guarantee of reaching global energy minimum. • Dead-End Elimination • Allows only discrete conformations. • Rejection criteria is used to prune the search space. Desjarlais JR, Clarke ND. Computer search algorithms in protein modification and design. Curr Opin Struct Biol. 1998 Aug;8(4):471-5. PubMed
qj qi qN-1 q2 defined over large dimensionalconformation space qN q1 Review: Energy Landscape JC Lantombe, Energy2.ppt
Review: Example Energy Function • E = S bonded terms + S non-bonded terms + S solvation terms • E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole) • Bonded terms - Relatively few • Non-bonded terms- Depend on distances between pairs of atoms -O(n2) Expensive to compute • Solvation terms- May require computing molecular surface JC Lantombe, Energy2.ppt
Review: Monte Carlo Simulation (MCS) • Random walk through conformation space • At each cycle: • Perturb current conformation at random • Accept step with probability: (Metropolis acceptance criterion) • The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ JC Lantombe, Energy2.ppt
Monte Carlo Simulation • Tend to waste time in local min. • May consist of millions of steps. • Energy must be evaluated frequently (computationally expensive). • Use ChainTree to improve performance. Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.: Efficient maintenance and self-collision testing for kinematic chains. In: Symposium on Computational Geometry (2002) 43–52 Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity. Journal of Molecular Biology, 293, 1161-1181 (1999).
Genetic Algorithm Starts with First generation pool. • Iteratively apply genetic operators (selection, recombination, mutation). • Evloves toward better solution (low energy function). S. M. Larson, J. England, J. DesJarlais, and V. S. Pande. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Science 11 2804-281 (2002). Protein Science
Selection • Selection function takes into account the value of fitness function. This gives priority to the “fit” organism but also gives chance for “less fit” organisms. http://en.wikipedia.org/wiki/Genetic_algorithm
Selection Method • Roulette Method: probability of selection is proportional to the value of fitness function • Tournament: picks k individuals (tournament size), and choose the individual with probability p. Iterate with probability p*(1-p), then p*(p*(1-p)) … • Higher k = less chance for weaker individual. http://en.wikipedia.org/wiki/Roulette_wheel_selection http://en.wikipedia.org/wiki/Tournament_selection
Recombination, Mutation • Recombination: different segment of the structure which is optimized in parallel can be recombined into the same model. Recombination occurs with a set probability. Otherwise, the population is propogated to the next generation. • Mutation: avoids local minima by mutating the child with a set probability. • Similar to MC: there is no guarantee to converge into global minimum. http://en.wikipedia.org/wiki/Genetic_operator
Genome@home • Genome@Home uses distributed computing and genetic algorithm. • It also incorporates backbone flexibility using Monte Carlo (random perturbation with RMSD<1.0a) which improves the result. http://www.stanford.edu/group/pandegroup/genome/
Dead-end Elimination • Discrete conformational search. • Functionally equivalent to exhustive search. • It uses rejection criteria to prune the search space. • The robustness depends on the discreteness and the rejection criteria used. • Guaranteed convergence to global min. • Initially used for sidechain placement. More difficult for protein design because of high degrees of freedom. Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol. 2001 Mar 16;307(1):429-45. PubMed
Energy of conformation • Reformulation of sidechain placement problem: Amino acid identity is used instead of rotamer. • The general DEE allows residue up to 300. • Energy of conformation is defined as sum of interaction among side chains and sum of interaction of sidechain and the backbone. • Rejection criteria is used and iterated until no more rotamers can be eliminated. Convergence occurs, or reduces the problem sufficiently for exhaustive serach.
DEE filter: Rejection Criteria • Simple Criterion: If lowest energy struct that can be found using a given sidechain rotamer (low energy side chain conformation) is higer than the highest energy struct w/ different rotamer, the first rotamer is eliminated.
DEE filter: Rejection Criteria • Goldstein Criteria:if energy struct containing one rotamer is always lowered by changing to a second one, the first one is eliminated.
DEE filter: Rejection Criteria • Generalized Criterion:residues are added in group, eliminated clusters of rotamers in the groups maybe excluded from the minimum operator, in addition to those which form dead-end clusters with c.
Mean Field Theory • Reduce search space. • Self-consistency is sought by placing amino acids at pre-selected positions in a given structure. • Energy function is minimized by mean field. Voigt CA, Mayo SL, Arnold FH, Wang ZG. Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83. PNAS
Review: Branch & Bound • Set of solutions can be partitioned into subsets (branch) • Upper limit on a subset’s solution can be computed fast (bound) Branch & Bound • Select subset with best possible bound • Subdivide it, and compute a bound for each subset S.Batzoglou, Threading2.ppt
Rosetta Design • Initial backbone designed without regard to side-chain packing. • Iterates between sequence design and backbone optimization using Monte Carlo. • Perturbation in random change in the torsional angles of 1-5 random residue, or substitution of backbone torsonal angles of 1-3 consecutive residues with torsional angles from a structure in the PDB. Sidechain optimization. Accept/reject using Metropolis criterion. • 1.17-a backbone atom RMSD between model and structure. Crystal structure of top7 – A novel protein structure created with RosettaDesign. http://rosettadesign.med.unc.edu/ Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003 Nov 21;302(5649):1364-8. PubMed
Using Rosetta Design • Red: PDB 1A1M: Mhc Class I Molecule B*5301 Complexed With Peptide Typdinqml From Gag Protein Of Hiv2 • Blue: Rosetta Stone Designed • Visualized with Deep View / Swiss-PdbViewer. http://us.expasy.org/spdbv/ http://www.rcsb.org/pdb/cgi/explore.cgi?pid=195321117535569&pdbId=1A1M
b.e.a.n.s. • A simple openGL based program was developed to test monte carol and genetic algorithms on designing “chain of jelly beans.” • User is able to vary the initial structure of the “beans” and compare the efficiency of the algorithms via built-in timer.