590 likes | 605 Views
Structural Bioinformatics. Elodie Laine Master BIM-BMC Semester 3 , 2017-2018. Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents : http://www.lgm.upmc.fr/laine/STRUCT e-mail : elodie.laine@upmc.fr. Lecture 4 – Tertiary Structure. Elodie Laine – 24.10.2017.
E N D
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, 2017-2018 Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents: http://www.lgm.upmc.fr/laine/STRUCT e-mail: elodie.laine@upmc.fr
Lecture 4 – Tertiary Structure Elodie Laine – 24.10.2017
Tertiary structure prediction We seek to determine the native structure of a protein, ie the structure of minimum free energy. The energy function will serve to discriminate the (many) possible conformations. How to explore the conformational space? Elodie Laine – 24.10.2017
Conformational space Inactive autoinhibited(RX) Inactive intermediate(MD) Active phosphorylated (RX) phosphorylation phosphoryl. inhibitor mutation P P Inactive non-autoinhibited(RX) Inactive bound to Imatinib (RX) Elodie Laine – 24.10.2017
Conformational space Inactive autoinhibited(RX) Inactive intermediate(MD) Active phosphorylated (RX) phosphorylation Equilibrium population of protein conformations Protein native state energy landscape phosphoryl. inhibitor mutation P P Inactive non-autoinhibited(RX) Inactive bound to Imatinib (RX) Elodie Laine – 24.10.2017
Energy landscape The energy landscape is a multidimensional hypersurface obtained by mapping all possible conformations of a system & their corresponding (free) energy levels. cyclohexane For a system of N atoms, the number of degrees of freedom is 3N. Generally one represents only a section of the hypersurface along a reaction coordinate. Elodie Laine – 24.10.2017
Energy landscape • The energy landscape is a funneledrugged hypersurface: • many non-native local minima • deep free energy minimum with steep walls. Elodie Laine – 24.10.2017
Tertiary structure prediction How can we predict a protein 3-dimensional coordinates among an astronomical number of conformations? Ab initio modeling Comparative modeling or Fold recognition Search among all possible conformations and selection based on an energy function Assume that the structure of a protein can be modelled based on known structures serving as templates Elodie Laine – 24.10.2017
WHEN WE KNOW THE STRUCTURE OF A CLOSE HOMOLOG Elodie Laine – 24.10.2017
Comparative modeling principles Comparative modelling is based on the assumption that the space of all possible conformations is smaller than the sequence space. Typically, proteins sharing more than 30% sequence identity adopt similar structures. Elodie Laine – 24.10.2017
Comparative modeling principles Query or Target A protein with unknown structure Search for homologs in a database of known protein structures Choose one or more homologs to serve as template(s) Model the target structure based on the template(s) structure(s) Elodie Laine – 24.10.2017
Comparative modeling (1/11) 1/ Choice of a template and sequence alignment No or very seldom manual intervention required http://swift.cmbi.ru.nl/gv/facilities/ Sequence alignment may require slight adjustments Manual intervention required to optimize sequence alignment No template identified by classical methods Joosten R P et al. NAR 2011 Elodie Laine – 24.10.2017
Comparative modeling (1/10) 1/ Choice of a template and sequence alignment No or very seldom manual intervention required The choice of the template must be realised based on: - the sequence similarity between target and template - the quality of the template experimental structure - the environmental conditions (pH, salt concentration…) Sequence alignment may require slight adjustments Manual intervention required to optimize sequence alignment No template identified by classical methods Joosten R P et al. NAR 2011 Elodie Laine – 24.10.2017
Comparative modeling (2/10) known structure template target predicted structure known structure template target predicted structure The sequence alignment step is crucial: locally wrong alignments can lead to wrong structural correspondence assignment Elodie Laine – 24.10.2017
Comparative modeling (3/10) 2/ Building the backbone • Rigid-body assembly (Greer 1981) • Based on the assumption that proteins can be divided in (i) conserved core regions, (ii) variable loops and (iii) lateral side-chains. The most conserved regions in the alignment are assembled as rigid bodies. • Multiple templates can be used: their structures are superimposed; similar regions from a structural point of view are identified as rigid bodies; the sequence alignment between the templates and the target leads to the assignment of these rigid bodies to the target. • Segment-matching (Jone & Thirup 1986) • Uses positions derived from the alignment as a guide to find matching segments in a representative database of all known protein structures. • The database contains short segments of protein structure that are selected using energy or geometry rules, or a combination of these criteria. Elodie Laine – 24.10.2017
Comparative modeling (4/10) • Spatial constraints (Sali & Blundell 1993) • Some spatial constraints are derived from the selected template(s): bond lengths, angle values, interatomic distances… and then transferred to the sequence alignment. The constructed structural model must garantee minimal violation of the constraints. • The constraints can be obtained from various sources (not only homology-based), e.g. NMR data. MODELLER workflow Elodie Laine – 24.10.2017
Comparative modeling (5/10) 3/ Building the loops Very delicate task: loops at the surface of proteins can be artificially constrained in the crystal, some loops are inherently flexible… loop modeling is feasible up to 8 amino acids. KELV---------LVLYDY QEKSPRELSQTI KKGDILTLLNSTNKDWWKVE KDLVGNDRLVLYDY QDKSIREL-----TI KTGDILTLLNSTQKDWWKVH Insertions / Deletions Loops from the target and the template have different lengths => ab initio modeling • Search the database for loops with similar seq length & end-to-end distance • Molecular mechanics / molecular dynamics refinement Elodie Laine – 24.10.2017
Comparative modeling (6/10) 4/ Building the side-chains backbone side-chains … … • Identical residues between target and template: refinement of the side-chains • Different residues between target and template: ab initio modeling Elodie Laine – 24.10.2017
Comparative modeling (7/10) 4/ Building the side-chains Use of a rotamer librairies: side-chain conformations observed in known structures Generally the backbone limits the conformational freedom of the side-chain Elodie Laine – 24.10.2017
Comparative modeling (8/10) 4/ Building the side-chains side-chain backbone The correct prediction of rotamers for every residue is a combinatorial problem possible rotamers at this position selected rotamer at this position Elodie Laine – 24.10.2017
Comparative modeling (9/10) 5/ Validation of the model The quality of the model decreases with the similarity between the target and the template • What can we expect? • id > 90%: as good as an X-ray structure • 50% < id < 90%: global RMSD around 1,5 Å • id < 25%: lots of errors due to failure in the sequence alignment • What types of error? • side-chain stacking (critical error in catalytic or binding sites) • distorsion or shift in correctly aligned regions • errors in regions lacking template (ab initio modeling) • errors from incorrect alignment • wrong template (id < 25%) Elodie Laine – 24.10.2017
Comparative modeling (10/10) SWISS-MODEL backbone model SWISS-MODEL full model Native structure MODELLER model Elodie Laine – 24.10.2017
WHEN WE KNOW A STRUCTURE COMPATIBLE WITH THE SEQUENCE Elodie Laine – 24.10.2017
Fold recognition Fold recognition is used when no protein with known structure and sequence similar to the target is found. Instead of enumerating all possible conformations, one assumes that the protein adopts a known fold. threading • HHpred/HHsearch • http://toolkit.tuebingen.mpg.de/hhpred • - I-TASSER • http://zhanglab.ccmb.med.umich.edu/I-TASSER/ • - 3D-PSSM (Phyre) • http://www.sbg.bio.ac.uk/~3dpssm • - Threader • http://bioinf.cs.ucl.ac.uk/threader/threader.html The target sequence is threaded along structures taken from a library of known folds. Elodie Laine – 24.10.2017
I-Tasser protocol Elodie Laine – 24.10.2017
I-Tasser protocol Query or Target A protein with unknown structure Yang et al. Nature Methods 2015 Search for templatesby threading (8 programs used!) centroid models Build unaligned regions from scratch (MC simulations) Excise continuously aligned fragments from template re-assemble the structure identify low-energy models by strcuture clustering all-atom models minimization and quality assessment Elodie Laine – 24.10.2017
Fold recognition The sequence-structure associations are ranked according to their energies. The quality of the final model can be evaluated through a Z-score: the classical measure of RMSD or the more recent TM-score: Elodie Laine – 24.10.2017
WHEN WE KNOW NOTHING Elodie Laine – 24.10.2017
Ab initio prediction principles Ab initio modelling consists in searching through the conformational space of the protein. As in the previously described approaches, one assumes that all information necessary to the determination of the protein fold is encoded in its sequence. • Explore the protein conformational space • Search for the minimum of a scoring function The scoring function corresponds generally to the interatomic interaction energy. Elodie Laine – 24.10.2017
Ab initio prediction workflow Query or Target A protein with unknown structure Choose (simplified) structural representation & discretization of the conformational space Choose an energy function adapted to the level of simplification Choose an algorithm to search the conformational space Elodie Laine – 24.10.2017
Ab initio prediction (1/18) 1/ Structural representation and space discretization All-atom level Simplified side-chains Omitted side-chains SPEED PRECISION Elodie Laine – 24.10.2017
Ab initio prediction (2/18) • Conformations can be represented by: • Cartesian coordinates: the constraints of the polypeptide chain must be explicitly addressed (bond length, bond angle, torsion…) • Internal coordinates representing torsion angles φ, ψ, ω, the bond angles and lengths being fixed: a small change induces a global adjustment of the downstream residues (steric clashes) • Interatomic or interresidue distances: collective variables that reduce the number of dimensions and must be carefully chosen • Regular lattices (cubic, tetrahedral…): almost impossible to repre-sent a real protein structure unless the grid spacing is very small Elodie Laine – 24.10.2017
Ab initio prediction (3/18) Regular lattice 2D 3D Poor realism, do not enable to represent helices, although compensated motions are permitted. Irregular lattice Enable to represente perfectly (maybe too much) secondary structure elements, but not compensated motions. Elodie Laine – 24.10.2017
Ab initio prediction (4/18) 2/ Scoring function – Energy function Each conformation is given a score representing its free energy of stability. The energy function must be adapted to the protein representation simplification level. 3/ Conformational sampling • Sampling of the conformational space can be performed using: • Systematic search • Molecular dynamics • Hierarchical construction • Random sampling Elodie Laine – 24.10.2017
Ab initio prediction (5/18) • Systematic search Assumes equal probabilities for non-native states. The conformational space is systematically explored by regular changes of the degrees of freedom values. golf-course model of the energy landscape Rugged funneled view of the energy landscape Elodie Laine – 24.10.2017
Ab initio prediction (6/18) • Example • search for the optimal combination of φ et ψ torsion angle values • bond lengths and angles remain fixed • ω is set to 180° for all residues • side-chains are omitted • φ et ψ are varied from 0 to 360°, with an increment fixed to θ • Combinatorial explosion: the number of conformations being tested is: • θ=30° • N=2: 144 conformations • N=3: 21,000 conformations • N=5: 430,000,000 conformations Elodie Laine – 24.10.2017
Ab initio prediction (6/18) • Example • search for the optimal combination of φ et ψ torsion angle values • bond lengths and angles remain fixed • ω is set to 180° for all residues • side-chains are omitted • φ et ψ are varied from 0 to 360°, with an increment fixed to θ • Combinatorial explosion: the number of conformations being tested is: • θ=30° • N=2: 144 conformations • N=3: 21 000 conformations • N=5: 430 000 000 conformations • How to improve search efficiency and reduce computation time? • Quality check on partially built structures (structures displaying wrong or high energy substructures are deleted) • Trade-off between the resolution of the search grid and computational time Elodie Laine – 24.10.2017
Ab initio prediction (7/18) • Molecular dynamics Newton Initial model Calculate molecular forces acting on each atom Move each atom according to those forces Advance simulation time by a given time step In principle, with long simulations the entire conformational space can be explored (ergodicity) Elodie Laine – 24.10.2017
Ab initio prediction (7/18) • Molecular dynamics Initial model Completely unfolded polypeptide or first rough guess for the native state Calculate molecular forces acting on each atom Molecular mechanics potential energy function Move each atom according to those forces Numerical integration algorithm (Verlet, leap-frog…) Advance simulation time by a given time step The time step must be smaller than the fastest relaxation time Elodie Laine – 24.10.2017
Ab initio prediction (8/18) • Molecular dynamics • Solvent representation: explicit or implicit • High frequency vibrations frozen: longer time step • Boundary conditions: treating long-range interactions • Statistical ensemble • microcanonical (NVE) • canonical (NVT) • isothermal-isobaric (NPT) • generalized (replica-exchange) Elodie Laine – 24.10.2017
Ab initio prediction (9/18) • Hierarchical construction Structures are decomposed into fragments of well-defined 3D structures, e.g. secondary structure elements. Those fragments are then assembled to build the entire protein 3D structure. Strong assumption: Indepedence of the different fragments conformations How valid is that assumption? Solution: retain several conformations for each fragment that should cover the ensemble of structures observed in databases. Elodie Laine – 24.10.2017
Ab initio prediction (10/18) the BriX Collection of Canonical Protein Fragments Elodie Laine – 24.10.2017
Ab initio prediction (11/18) • Random search Enable to jump from one region of the energy landscape to another in one step. • Random changes can be applied to: • cartesian coordinates of randomly chosen atoms/residues • randomly chosen backbone torsion angles (internal coordinates) • randomly chosen distances between atoms or residues What makes the simulation stop? • The conformation selected for the next iteration will be: • that generated at the previous step • the least frequently picked up one among those previously generated • the lowest energy one among those previously generated Elodie Laine – 24.10.2017
Ab initio prediction (12/18) • Random search: Monte Carlo Metropolis criterion: ifE<0 then accept else if then accept ε is a random number between 0 & 1 k is the Boltzmann constant T is the temperature The probability of accepting high energy conformations increases with the temperature T Elodie Laine – 24.10.2017
Ab initio prediction (12/18) • Random search: Monte Carlo + simulated annealing The temperature T varies during the simulation Elodie Laine – 24.10.2017
Ab initio prediction (13/18) Reversibility At equilibrium: as many transitions from state n to state m as from state m to state n: Transition rate: product between the population density in state m, mwith the element mn of the transition matrix: As the population follows a Boltzmann distribution: Elodie Laine – 24.10.2017
Ab initio prediction (14/18) • Random search: evolutionary algorithms Proceed through the search as a biological evolutionary process. • a population of conformations is created • members of the population are evaluated using an energy/scoring functions that measures their quality/fitness • the population evolves (changes) hopefully toward better solutions Recombination Mutation Elodie Laine – 24.10.2017
Ab initio prediction (15/18) • Random search: ant colony (Colorni, Dorigo & Maniezzo 1991) Inspired from the observation that ants always find the shortest path between their nest and a food source • Ants initially wander randomly, and upon finding food return to their colony while laying down pheromone trails • If other ants find such a path, they are likely not to keep travelling at random, but to instead follow the trail, returning and reinforcing it if they eventually find food • Over time, however, the pheromone trail starts to evaporate, thus reducing its attractive strength. As a consequence the pheromone density on the shortest path becomes the highest Elodie Laine – 24.10.2017
Ab initio prediction (16/18) • Random search: ant colony (Colorni, Dorigo & Maniezzo 1991) Inspired from the observation that ants always find the shortest path between their nest and a food source Elodie Laine – 24.10.2017
Ab initio prediction (17/18) • Random search: ant colony (Colorni, Dorigo & Maniezzo 1991) • Artificial ant colony set up: each ant builds a conformation for the protein of unknown structure • Conformations are evaluated using a scoring function and pheromones are deposited for the best solutions • The next generation of ants builds new structures while accounting for the deposited pheromones experimental data predicted structure Elodie Laine – 24.10.2017