170 likes | 269 Views
Evolving L-Systems to Capture Protein Structure Native Conformations. Gabi Escuela 1 , Gabriela Ochoa 2 and Natalio Krasnogor 3 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela 1 gabiescuela@netuno.net.ve, 2 gabro@ldc.usb.ve
E N D
Evolving L-Systems to Capture Protein StructureNative Conformations Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela 1gabiescuela@netuno.net.ve,2gabro@ldc.usb.ve 3 School of Computer Science and I.T., University of Nottingham Natalio.Krasnogor@nottingham.ac.uk
Content • Proteins • Protein Structure Prediction (PSP) • The HP model • EA approaches to PSP: current encoding • L-Systems • Why a grammatical encoding? • Methods and Results • Discussion and Future Work 3D structure of myoglobin, showing coloured alpha helices.
Proteins • Linear chains of ~30-400 units from 20 different amino acids • Fold into a unique functional structure: native state or tertiary structure Show repeated substructures: alphahelices and beta sheets 1A8M 3-D Structure
Protein Structure Prediction (PSP) • Goal: Determining the 3D structure of proteins from their amino acid sequences • Strategy: find an amino acid chain's state of minimum energy • Solution will have practical consequences in medicine, drug development and agriculture
The 2D HP Model 2 Amino acids types: hydrophobic (H) and polar or hydrophilic (P) • Hydrophobic effect is the main force governing folding • qЄ{H, P}+, each letter of q has to be put in vertex of a given lattice L (at each point: turn 90º Left or Right, or continue ahead) • Scoring function: adds -1 for each “contact” between two Hs adjacent in the lattice that are not consecutive in q HPHPPHHPHPPHPHHPPHPH Square Lattice 9 H-H bonds Score = -9 Objective:Find the organization (embedding) of qin Lof minimum score (maximum contacts)
EA approaches to PSP: Current (Direct) Encoding • EAs and other stochastic methods: global optimization of a suitable energy function • Encoding: Cartesian Coordinates, Distance Geometries, Internal Coordinates • Absolute: structure encoded as a string of symbols. For example: In the 2D Square s = {Up, Down, Left, Right}+ • Relative: each move is interpreted in terms of the previous one s = {Forward, TurnLeft, TurnRight} +
Protein : HPHPPHHPHPPHPHHPPHPH L =20 Absolute Encoding RDDLULDLDLUURULURRD L = 19 R D L D First position is fixed Relative Encoding RFRRLLRLRRFRLLRRFRL = 18 R R R F First and second position are fixed
F F+f F+f+F F+f+F+F+f L-Systems (Lindenmayer, 1968) • A model of morphogenesis, based on formal grammars • Rewriting: Define complex objects by replacing parts of a simple object using a set of productions. • Symbols: F, f, +, -, [, ] • Axiom (S) • Production (replacement) rules r1: S: F r2: f F start F 1 F+f 2 3
Why a Grammatical Encoding? • Specifies how to construct the phenotype • Can achieve greater scalability through self-similar and hierarchical structure • Proteins exhibit high degree of regularity, and repeated motifs • Current encoding may not be suitable for crossover and building block transfer between individuals Protein Structure 3D L-System
Method • Prove of principle: Can a folded protein be captured (encoded) by an L-system? • How to find that L-system: An EA used to evolve an L-system that capture a folded protein (inverse problem) Output: L-system L that once derived, will produce the target string RFRRLLRLRRFRLLRRFR Input: Folded structure in Relative Coordinates RFRRLLRLRRFRLLRRFR EA Axiom = 01F Rules = {0:RFR1, 1:2L2, 2:R0L}
Proposed Grammatical Encoding • D0L-system (deterministic and context free): Alphabet: =tnt t={F,L,R} terminal symbols (relative coord.) nt={0,1,2,...,m-1} non-terminal symbols (rewriting rules), m = max. number of rules Axiom: α * Rewriting rules: i: wi , where i nt and wi* axiom R2 rules0:R03F; 1:R01L; 2:F310; 3:LRL3 Example
Evolutionary Algorithm • Generational with rank based selection • Randomly generated initial population • Prefixed maximum number of rules • Axiom and Rules: randomly generated strings of prefixed maximum length • Genetic operators • Uniform-like (homologous) recombination (rate = 1.0) complete production rules are interchanged • Per symbol mutation in both axioms and rules (deletion (30%), insertion (10%), modification(60%))
Axiom = 31 Rules={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1} genotype axiom 31 1st step RFR1 R0RL 3 1 2nd step RFR R0RL R 3LL2 RL 1 0 3th step RFRR 3LL2 RL R RFR1 LL RRF RL 0 3 2 post-processing phenotype fitness= 18 RFRRLLRLRRFRLLRRFR Derivation, and Fitness Function • Derivation: from genotype (axiom and rules) to phenotype (folded structure) • Post-processing: non-terminal symbols pruning • Fitness calculation: number of matches between the target string and the solution Min. = 0, Max = length of the desired folding.
Results (2) Evolutionary progression towards the target structure
Discussion • The proposed EA discovered L-systems that capture a target folding under the HP model in 2D lattices • We are not solving the PSP yet, but .. • We are proposing a novel and potentially useful, generative encoding for evolutionary approachesto PSP
Future work • Incorporate problem knowledge about secondary structures Beta Turn Beta Sheet Alpha Helix • Explore longer chains and 3D lattices