430 likes | 638 Views
Stochastic Molecular Replacement. Nicholas M. Glykos MBG, DUTH, Alexandroupolis, Greece. 2x. 2x. Molecular Replacement as a rigid-body refinement problem :.
E N D
StochasticMolecular Replacement. Nicholas M. Glykos MBG, DUTH, Alexandroupolis, Greece.
Molecular Replacement as a rigid-body refinement problem : Determine the orientations and positions of the search models for which the agreement between the observed and calculated structure factor amplitudes is maximised. If the structures of the search models are sufficiently accurate, these orientations and positions will correspond to the correct molecular replacement solution.
Molecular Replacement as a rigid-body refinement problem : Systematic search : Examine all unique combinations of positions and orientations of all search models. The best solution will be the molecular replacement solution. Stochastic searches : Use a global optimisation method (usually non-deterministic) to efficiently search the multidimensional parameter space.
Basic equations : The structure factor
Basic equations : The target function(s)
The search algorithms : • Genetic algorithms. • Evolutionary programming (EPMR). • Simulated annealing (Qs).
The Metropolis algorithm : • Assign random initial positions & orientations to all molecules present in the asymmetric unit of the target crystal structure. Calculate Fc’s from this arrangement. • Calculate the R-factor between the Fo’s and the Fc’s. Call this Rold.
The Metropolis algorithm : • Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew). • If Rnew< Rold, then, the new arrangement is accepted and we start again from (3). • If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].
Annealing schedules : • Constant temperature run. • Linear temperature gradient (slow cooling). • Boltzmann annealing (logarithmic schedule). • “Heating bath” mode. The temperature is automatically adjusted in such a way as to keep the fraction of moves performed against the gradient of the target function constant and equal to a user-defined value.
Temperature determination : At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Temperature determination : At T=0.3125000, average R=0.59937 At T=0.1562500, average R=0.59707 At T=0.0781250, average R=0.59861 At T=0.0390625, average R=0.59028 At T=0.0195312, average R=0.58783 At T=0.0097656, average R=0.57545 At T=0.0048828, average R=0.55527 At T=0.0024414, average R=0.53016 At T=0.0012207, average R=0.52038 At T=0.0006104, average R=0.51799 At T=0.0003052, average R=0.51524
Scaling & bulk solvent correction • The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor even at the relatively low resolution used for molecular replacement calculations. • The program implements the exponential scaling model algorithm which allows a computationally efficient and model-independent bulk solvent correction to be applied : Fcorrected = Fp { 1.0 – ksol exp[ -Bsol / d2 ] }
Speeding it up : • Avoid FFTs : calculate and store (in core) the molecular transform of the search model. • Keep a table containing the contribution of each molecule to each reflection. • CPU time per step ~ Number of reflections in P1.
The program : • Name : “Queen of Spades” (Qs). • Availability : free, open source software, BSD-like license. • The distribution includes source code, plenty of documentation, plus pre-compiled executables for Linux, Irix, OSF, Solaris, VMS & windoze. • Download the latest version via http://www.mbg.duth.gr/~glykos/or from the various CCP14 mirrors. • Current stable version : 1.3.
Using the program : • Input : .pdb files containing the models, and a formatted (ASCII) file containing h,k,l,F,σ(F). • Output : .pdb files containing the final coordinates for each model, plus a packing diagram for each solution.
Running the program (1) : $ Qs –auto 1 or, $ Qs –auto 2 etc.
Running the program (2) : ########################################################## # Target function (can be R-FACTOR, CORR-1 or CORR-2) and # number of minimisations and steps. # TARGET R-FACTOR CYCLES 5 STEPS 100000000 ############################################################ # Annealing schedule & move size control. # BOLTZMANN START 0.06800 ############################################################ # Reflection selection. # KEEP 0.70 AMPLIT_CUTOFF 1.0 SIGMA_CUTOFF 2.0 RESOLUTION 15.0 3.5 . . . . . . .
Examples : A 17D problem. • Target structure 1a02, NFAT-Fos-Jun-DNA. • Treated Fos-Jun as one model. • Monoclinic space group (P21), experimental 19-4Å data. • Models deviated by 1.1, 1.5 and 2.2Å. • Three days per run on an Intel PIV at 1.8 GHz.
Examples : A 17D problem. • Target structure 1a02, NFAT-Fos-Jun-DNA. • Treated Fos-Jun as one model. • Monoclinic space group (P21), experimental 19-4Å data. • Models deviated by 1.1, 1.5 and 2.2Å. • Three days per run on an Intel PIV at 1.8 GHz.
Examples : A 23D problem. • Target structure : monoclinic form of the A31P Rop mutant • Model : one poly-Ala helix (13% of atoms). • Four helices per asymmetric unit. • Space group C2, 15-3.5Å data. • Target function 1.0-Corr(Fo,Fc) • 36 hours per run on an Intel PIII at 800MHz.
Examples : A 23D problem. • Target structure : monoclinic form of the A31P Rop mutant • Model : one poly-Ala helix (13% of atoms). • Four helices per asymmetric unit. • Space group C2, 15-3.5Å data. • Target function 1.0-Corr(Fo,Fc) • 36 hours per run on an Intel PIII at 800MHz.
Disadvantages : • In most cases, treating the molecular replacement problem as 6n-dimensional is like shooting a sparrow with a cannon. • The structures of the search models are kept fixed throughout the calculation. • The (putative) evidence from the self-rotation function and/or the native Patterson function are not actively used.
Disadvantages : • When the starting models deviate significantly from the target structures, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.
Advantages : • All information (data + structures) is used right from the beginning of the calculations. • If there are just one or two molecules per asymmetric unit and CPU time is not a problem, the method can be used as a last ditch effort to conclusively show that there is no such thing as a pronounced global minimum (or otherwise ?). • The computational procedures differ so much from those used in the other methods, that the results obtained can be considered as independent.
Advantages : • The method’s only requirement is that the global minimum of the target function (for the given models and data), corresponds to the correct solution. • The method does not assume that the self- and cross-vectors are topologically segregated in the Patterson function, and is, thus, expected to be more robust in the case of closely-packed structures, or when the molecules deviate significantly from being approximately spherical.
Conclusion : • Substituting computing for thinking will almost certainly fail for n≥ 5.