290 likes | 397 Views
Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques. Ka-Chun Wong , Kwong-Sak Leung, Man-Hon Wong Department of Computer Science & Engineering The Chinese University of Hong Kong, HKSAR, China { kcwong , ksleung, mhwong}@cse.cuhk.edu.hk. Outline.
E N D
Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques Ka-Chun Wong, Kwong-Sak Leung, Man-Hon Wong Department of Computer Science & Engineering The Chinese University of Hong Kong, HKSAR, China {kcwong, ksleung, mhwong}@cse.cuhk.edu.hk
Outline • Introduction • Background • Objective • Related Works • Paper Contributions • Apply multimodal optimization techniques • Propose a novel mutation method • Experiments • Conclusion
Introduction • Protein is: • a sequence of amino acid residues folded into a 3D structure • important for living: • Material transportations across cells • Catalyzing metabolic reactions • Body defenses against viruses
Introduction • Protein Function: • Substantially depends on its 3D structure http://www.pdb.org/pdb/explore/explore.do?structureId=2X7M
Introduction • Protein Structure Determination • “Wet-lab” experiments exist • X-ray crystallography • NMR spectroscopy • …… • But they are: • Labor intensive • Not scalable • Expensive
“Wet lab” experiments for Protein Structure Determination are Costly Time-consuming Not scalable Accurate Computational approaches for Protein Structure Prediction are Less Costly Fast Scalable Less Accurate Introduction Complementary Twins Wet-labs for fine-tuning Computation for coarse-tuning
Introduction • Protein Structure Prediction (PSP) • Input: An amino acid sequence • Output: The 3D structure of the sequence • Divided into two classes: • Using / Not using • similar sequences & their structures Prediction ……YDVAEGCKVV…… Similar sequences & their structures
Introduction • This paper focuses on • De novo protein structure prediction on the 3D HP lattice model using evolutionary algorithms * • De novo means: the input of the method only contains the sequence to be predicted *N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.
Background • 3D HP lattice model • Assume the main driving forces are the interactions among the hydrophobic amino acid residues • All known amino acid residues are experimentally classified as either hydrophobic (H) or polar (P).
Background • 3D HP lattice model • An amino acid sequence is represented as a string {H,P}+ • The sequence folded into a limited space, a cubic lattice
Background • Amino acid residue – Bead • Peptide bond – Straight Line HPHPPHHPHPPHPHHPPHPH H: Red color P: Blue color
Objective • To find the conformation with the minimal energy. • Maximize the number of the H-H bonds which are formed by two non-sequence-adjacent residues (non-local H-H bonds)
Objective • Mathematically, it is to minimize the following function: Distance Function Only non-sequence-adjacent residues are checked Bond Energy * H. Li, R. Helling, C. Tang, and N. Wingreen. Emergence of Preferred Structures in a Simple Model of Protein Folding. Science, 273(5275):666–669, 1996.
Related Works • Unger et al. first apply a hybridized genetic algorithm to solve the problem [1] • Patton et al. use a standard genetic algorithm [2] [1] Unger, R. and Moult, J. 1993. Genetic Algorithm for 3D Protein Folding Simulations. In Proceedings of the 5th international Conference on Genetic Algorithms S. Forrest, Ed. Morgan Kaufmann Publishers, San Francisco, CA, 581-588. [2] Patton, A. L., Punch, W. F., and Goodman, E. D. 1995. A Standard GA Approach to Native Protein Conformation Prediction. In Proceedings of the 6th international Conference on Genetic Algorithms (July 15 - 19, 1995). L. J. Eshelman, Ed. Morgan Kaufmann Publishers, San Francisco, CA, 574-581.
Related Works • Berger et al. prove that the problem is NP-complete [1] • Krasnogor et al. publish a work discussing the basic algorithmic factors affecting the problem [2] [1] Berger, B. and Leighton, T. 1998. Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete. In Proceedings of the Second Annual international Conference on Computational Molecular Biology. RECOMB '98. ACM, New York, NY, 30-39. [2] N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.
Related Works • Since then, many related algorithms are proposed. Some examples: • Multimeme algorithm by Krasnogor et al. • Guided genetic algorithm by Hoque et al. • Ant colony algorithm by Shmygelska et al. • Differential Evolution by Bitello et al. • Immune Algorithm by Cutello et al. • EDA by Santana et al.
Paper Contributions • Observation: • Some diversity preserving techniques are incorporated in most algorithms • Duplicate predator [1] • Aging operator [2] • Additional renormalization of the pheromone [3] [1] G. A. Cox, T. V. Mortimer-Jones, R. P. Taylor, and R. L. Johnston. Development and optimisation of a novel genetic algorithm for studying model protein folding. Theoretical Chemistry Accounts: Theory, Computation, and Modeling, 112(3):163–178, 2004. [2] V. Cutello, G. Nicosia, M. Pavone, and J. Timmis. An immune algorithm for protein structure prediction on lattice models. IEEE Transactions on Evolutionary Computation, 11(1):101–117, Feb. 2007. [3] A. Shmygelska and H. Hoos. An ant colony optimisation algorithm for the 2d and 3d hydrophobic polar protein folding problem. BMC Bioinformatics, 6(1):30, 2005.
Paper Contributions • Observation • Unger et al. have observed that there can be multiple conformations for each energy value [1] • A study also indicates the fitness landscapes of the problem are multimodal [2] [1] R. Unger and J. Moult. Genetic algorithms for protein folding simulations. J. Mol. Biol., 231:75–81, May 1993. [2] S. D. Flores and J. Smith. Study of fitness landscapes for the HP model of protein structure prediction. In Evolutionary Computation, 2003. CEC ’03. pages 2338–2345, Dec. 2003.
Paper Contributions • In this paper: • Apply multimodal optimization techniques to solve the PSP problem • Fitness Sharing (SharingGA) [1] • Species Conserving (SCGA) [2] • Crowding (CGA) [3] • Goldberg, D. E. and Richardson, J. 1987. Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second international Conference on Genetic Algorithms on Genetic Algorithms and their Application, 41-49. • Li, J., Balazs, M. E., Parks, G. T., and Clarkson, P. J. 2002. A species conserving genetic algorithm for multimodal function optimization. Evol. Comput. 10, 3 (Sep. 2002), 207-234. • De Jong, K. A. 1975 An Analysis of the Behavior of a Class of Genetic Adaptive Systems.. Doctoral Thesis. UMI Order Number: AAI7609381., University of Michigan.
Paper Contributions • In this paper: • Proposes a novel mutation method • Mixing two types of mutations together • Sometimes use RM, sometimes use AM • and apply it in CGA (called CGA-mixed) RM: Mutation in Relative Encoding AM: Mutation in Absolute Encoding
Experiments • Experiments are conducted: • Relative Encoding [1] • Hamming Distance • 100 Individuals (Overlapping) • Uniform Deterministic (Parent Selection) • Truncation (Survival Selection) • 50 runs • 105 and 5x106 energy evaluations • UN [2] as a control algorithm • N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999. • K.A. De Jong, Evolutionary computation: a unified approach. MIT Press, Cambridge MA, 2006
Experiments • 105 energy evaluations over 50 runs H(x): The lowest energy over 50 runs mean+σ: The lowest energy of a run averaged over 50 runs
Experiments • 5x106 energy evaluations over 50 runs H(x): The lowest energy over 50 runs mean+σ: The lowest energy of a run averaged over 50 runs
Experiments • The experimental results quoted in the following literatures are taken and compared under the same termination condition • Santana, R.; Larranaga, P.; Lozano, J.A.; , "Protein Folding in Simplified Models With Estimation of Distribution Algorithms," Evolutionary Computation, IEEE Transactions on , vol.12, no.4, pp.418-438, Aug. 2008 • Cutello, V.; Nicosia, G.; Pavone, M.; Timmis, J.; , "An Immune Algorithm for Protein Structure Prediction on Lattice Models,"Evolutionary Computation, IEEE Transactions on, vol.11, no.1, pp.101-117, Feb. 2007
Experiments • 105 energy evaluations over 50 runs H(x): The lowest energy over 50 runs mean+σ: The lowest energy of a run averaged over 50 runs
Experiments • 5 x 106 energy evaluations over 50 runs H(x): The lowest energy over 50 runs mean+σ: The lowest energy of a run averaged over 50 runs
Conclusion • In this paper, we: • Apply multimodal optimization techniques for PSP • Propose a novel mutation method for PSP • Some results comparable with the state-of-the-art algorithms have been obtained • The source codes can be downloaded at: http://pc89075.cse.cuhk.edu.hk:8080/myapp/GECCO2010-PSP-LatticeModels.zip
Q&A The source codes can be downloaded at: http://pc89075.cse.cuhk.edu.hk:8080/myapp/GECCO2010-PSP-LatticeModels.zip
Paper Contributions • Proposed mutation method • and apply it in CGA (called CGA-mixed)