520 likes | 652 Views
A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE. A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE. “Why ? What’s wrong with AMoRe ?”.
E N D
A stochastic approachto Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE
A stochastic approachto Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE
“Why ? What’s wrong with AMoRe ?” “Interesting. Can we now go back to the AMoRe .log file ?”
stochastic adj. • determined by a random distribution of probabilities. • (of a process) characterized by a sequence of random variables. • governed by the laws of probability. Etymology : Gk stokhastikos, f. stokhazomai aim at, guess, f. stokhos aim.
crystal2~ crystal2~ file Stochastic.ppt Stochastic.ppt : c program text with garbage crystal2~ crystal2~
“The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.”
“The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.”
“The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.”
“The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.” Acta Cryst. (2000),D56, 169-174
“The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.” Acta Cryst. (2000),D56, 169-174
The method(s) : • Treat all translational & orientational parameters of all molecules as variables whose values are to be simultaneously and independently determined.
The method(s) : • Assume that the correct solution corresponds to the (pronounced) global minimum of a suitable (?) statistic (like the R-factor, or the linear correlation coefficient between Fo’s and Fc’s, or, Fo2 and Fc2, or, …).
The method(s) : • Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space.
The method(s) : • Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space. Other published optimisation techniques include : a genetic algorithm approach (Chang & Lewis, 1997), an evolutionary search methodology (Kissinger et al., 1999) and a systematic 6D search (Sheriff et al., 1999).
The program : • Name : “Queen of Spades” • Availability : absolutely free, no warranties whatsoever. • The distribution includes source code plus pre-compiled executables for Irix, OSF, Linux, Solaris, VMS & windoze. • Download the latest version via http://origin.imbb.forth.gr/~glykos/ • Current stable version : α , Release 0.9.
The reverse Monte Carlo method: • Assign random initial positions & orientations to all molecules present in the asymmetric unit of the target crystal structure. Calculate Fc’s from this arrangement. • Calculate the R-factor between the Fo’s and the Fc’s. Call this Rold.
The reverse Monte Carlo method: • Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew). • If Rnew< Rold, then, the new arrangement is accepted and we start again from (3). • If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].
The reverse Monte Carlo method: • Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew). • If Rnew< Rold, then, the new arrangement is accepted and we start again from (3). • If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].
Speeding it up : • Avoid FFTs : calculate and store (in core) the molecular transform of the search model. • Keep a table containing the contribution of each molecule to each reflection. • CPU time per step ~ Number of reflections in P1.
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode.
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode. At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode. At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode. At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode. At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • “Heating bath” mode. At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Move size control : • Constant move size : max(Δt) = dmin/max(a,b,c) ) max(Δκ) =dmin (in degrees). • Move size linearly dependent on current R-factor and time step : max(Δt) = 0.5 R (1.0 - t/ttotal ) max(Δκ) =πR (1.0 - t/ttotal )
Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …
Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …
Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but … 0.32±0.02 23±5
Bulk solvent correction : • The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.
Bulk solvent correction : • The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary. • The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : Fcorrected = Fp{ 1.0 – ksol exp[ -Bsol / d2 ] }
Bulk solvent correction : • The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary. • The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : Fcorrected = Fp{ 1.0 – ksol exp[ -Bsol / d2 ] }
Bulk solvent correction ? Acta Cryst. (2000),D56, 1070-1072
Bulk solvent correction ? Acta Cryst. (2000),D56, 1070-1072
Using the program : • Input : a .pdb file, and a formatted file containing h,k,l,F,σ(F). • Running the program : $ Qs –auto 1, or, $ Qs –auto 2, etc. (no scripts), or, $ Qs <script.file> • Output : .pdb files containing the final coordinates for each model, plus a packing diagram for each solution.
Examples : A 5D problem. • One molecule of lysozyme per a.u. • Monoclinic space group (C2), 4Å data. • rms deviation of model 1.4Å. • Up to ±20% noise added to error-free data. • About 90 seconds of CPU time per minimisation.
Examples : A 6D problem (1). • Target structure 1bvx, search model 2lz2 (rms deviation 1.3Å). • One molecule of lysozyme per a.u. • Tetragonal space group (P43212) . • Real 15-4Å data. • About 3.8 hours of CPU time per minimisation.
Examples : A 6D problem (2). • Target structure 1b6q. • 30% solvent. • Search model : incomplete poly-Ala. • One monomer of Rop per a.u. • Orthorhombic space group (C2221) . • Real 15-4Å data. • About 40 minutes of CPU time per run.
Examples : A 6D problem (2). • Target structure 1b6q. • 30% solvent. • Search model : incomplete poly-Ala. • One monomer of Rop per a.u. • Orthorhombic space group (C2221) . • Real 15-4Å data. • About 40 minutes of CPU time per run.
Examples : An 11D problem. • Target structure 1lys, model 2ihl (rmsd 1.52 & 1.56Å). • Two molecules of lysozyme per asymmetric unit. • Monoclinic space group (P21), 4Å data. • ±20% noise added to error-free data. • Solutions appear after ~3.8 hours of CPU time.
Disadvantages : • In most cases, treating the problem as 6n-dimensional is a waste of CPU time. • You can only have one search model (ie you can not search simultaneously with your DNA & protein models). • The structure of the search model is kept fixed throughout the calculation.
Disadvantages : • The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored • When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.
Disadvantages : • The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored (but, in a way, for n >1 they are also ignored by the traditional methods). • When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.
Advantages : • If there are just one or two molecules per asymmetric unit and CPU time is not a problem, the method can be used as a last ditch effort to conclusively show that there is no such thing as a pronounced global minimum (or otherwise ?). • The automatic (black box) mode is really black: no keywords, no scripts, just a .pdb file containing the model and an ASCII file containing h,k,l,F,σ(F).
Advantages : • The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent.
Advantages : • The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent. • The method is honest in the sense that it is rather unlikely to find a wrong solution which will give a simultaneous sudden drop of both the R and Rfree leading to a solution with a reasonable packing arrangement.
A word of caution … Res R Corr 0.020 0.57 0.66 0.030 0.68 0.43 0.040 0.61 0.58 0.050 0.66 0.43 0.060 0.64 0.50 0.111 0.61 0.42 ----- ---- ---- 0.64 0.51
A word of caution … Res R Corr 0.020 0.57 0.66 0.030 0.68 0.43 0.040 0.61 0.58 0.050 0.66 0.43 0.060 0.64 0.50 0.111 0.61 0.42 ----- ---- ---- 0.64 0.51