160 likes | 250 Views
Developments ‘08…. Inclusion of intermolecular degrees of freedom Changes of the genetic algorithm Constrained Sampling q min i < q i ≤ q max i Hybrid Islands, Electrostatic Forcing Dynamic Tabus Buffered Migrations ProCheck structure selection (folding)
E N D
Developments ‘08… • Inclusion of intermolecular degrees of freedom • Changes of the genetic algorithm • Constrained Sampling qmini<qi≤qmaxi • Hybrid Islands, Electrostatic Forcing • Dynamic Tabus • Buffered Migrations • ProCheck structure selection (folding) • Divide & Conquer “Planetary” strategy • BestEffortdeployment • Simulation Results
Intermolecular degrees of freedom • Loose fragments detected & considered ligands • Chromosomes now include real values! • Torsional angles • 3 Euler angles/ligand • 3 Translations/ligand q1 q2 q3 … ... qn • The site must contain at least one fixed atom. • Translations (mapped onto [0…359.99], for homogeneity) position the topological center of the ligand within the box occupied by the free atoms of the site – unless a site_def.gen file is provided. • All Euler angles may evolve between 0 and 360°
One or two islands are allowed to use “heavy” alternative Heuristics. • At each generation, there is a total (tunable) probability phyb to use “directed” rather than classical mutations: • Torsional driving (Explorers), with a frequency of (phyb)2 , or • Electrostatic Forcing, with a frequency of phyb(1- phyb), replacing the ancient time-consuming Monte Carlo simulation. • Randomy increase weight of electrostatic interactions • Perform gradient relaxation with perturbed Hamiltonian • Reset Hamiltonian and reoptimize
Dynamic Tabus • A geometryiswithin a tabu zone if, for all degrees of freedomi, the differencesDi to the declared tabu geometry are below the minimal significant torsional differencessi, i.e. max(Di /si)<1 • However, making a binary « tabu or not tabu » decision for the currentgeometrydoes not suggestanyway to escapethe tabu zone. • A smooth & differentiable tabu penalty, decreasingwithincreasing max(Di /si) might permit escaping the tabu area by following the gradients • A differentialble approximation DMAX(Di /si) ≈max(Di /si) -1wasdefined • If the energye of the currentgeometryfallsbelow the one of the declared tabu structure et, thenthereis no more interest in leaving the tabu zone – whichhad been overhastily set!
Buffered Migrations • Stalled evolution of an island triggers population reset (apocalypse) in order to let the sampler move to other phase space zones. • If a migrant– likely related to the ancient population – enters the island, it will be fittest among primitive post-reset individuals • It will have a lot of children and drive natives to extinction • Strategy change: incoming migrants enter a buffer zone* and are released into the population as soon as its evolutionary dynamics seems to slow down • After 20 successive generations without progress, an island “opens” to migrants (in the mean time, natives should be comparable to migrants – if not, they deserve extinction) *Hortefeux, B., Sarkozy, N., “L”ImmigrationChoisie”, pp. 1-29 in The Alien Menace, Le Pen, J.M. Ed., Vichy Press (2007)
ProCheckused to discard misfolded proteinconformers… • Discard Structure if: • Has more than one residue in forbidded Rama-chandran area • Has a goodness factor < -1.0 • Has no minimal contigouossequence of secondary structure elements (AAA or BBB.*BBB) • Torsions of residuesoutsidecoreregions are discardedfrom the list of preferential values in seeding
Divide-and-ConquerPlanetaryStrategy • Allocates a number of nodes to be used for global (NG) and local (NL) sampling. • Global searches return a set of diverse low-energy conformers, representing potentially interesting cells. • Once such cells were found and stored into the Open Cell Repository (OCR) they are eligible for local sampling. • After the fifth local search, a cell will be closed (added to the Closed Cell Repository CCR) if the current run failed to discover any more stable geometry.
ResultIntegration Mode Launching Mode • Global Search • Assignfoundgeo-metries to cells • Merge entries into OCR (keepstablestgeometry /cell) • Update Samplingsuccessvs.Opera-tional pars. table Dispatcher Detect Running Jobs DetectResult Type • Submit Global Search • Set WALLTIME • Select SEED and TABU from CCR (if enough entries) • Select Operationalparameters. • Submit Local Search • Set WALLTIME/l • Pick a cellfrom OCR • Use ICR as SEED • Open Cell • Add to OCR • Addgeometries to ICR • ClosedCell • Add to CCR
BestEffortDeployment • The schedulernowruns on a regularlyreservednode, no longer on the frontend machine. • It checks for the list of currentlydeployedbesteffortnodes and decidesupon jobs to beassigned to each of these. • The panspermia strategy – selectingseeds and tabus –and the selection of the nextcell to besubmitted to local sampling – based on energy & diversitycriteria – maynowbeperformedwithoutrisking to overload the frontend machine • BestEffortnodes are running waitingloops, expecting a job assignment file (global or local search, tabus, seeds, cell to explore, etc.). • The frontendruns a meta-schedulerchecking, every 2 minutes, the state of the nodes, and trying to restart terminatedtasks.
Conclusions & Perspectives • The divide-and-conquer planetary strategy apparently works better than any other before • 1L2Y folded in <24 h, several days were needed before • However, there are no resources to lead any decent benchmarking concerning the choice of Kmax, NG/NL, etc. • It is practically out of question to use GRID 5000 for docking experiments on various systems!! • BestEffort deployment sucks! • Having jobs killed is not the worst thing that may happen • Having the one regular reservation (for the node running the scheduler) postponed lets all the other nodes do… nothing in BestEffort mode – they run an empty loop waiting for jobs no one submits! • Cannot run the scheduler in besteffort – getting it killed while accessing result databases may corrupt everything • We need some dedicated 100 nodes in order to make real progress.
Ab initio folding of Trp cage 1L2Y: native structure (reproducibly) found and ranked as most stable. Planetary model used max. 20 nodes for 4…5 days PDB
Ab initio folding of the Villin headpiece 1VII: helical parts are seen to fold in a matter of days (40 nodes) – although not properly oriented. PDB
Good news for the b-hairpin of Chignolin: out of the top 10 best ranked conformers, 8 are native-like • Number one is not – but in this case, that may not be a problem #1,#5 PDB
The Trp Zipper 1LE1 b-sheet is not the absolute energy minimum according to the current setup! PDB • However, proper folding of 1LE1 could be achieved (though not reproducibly!) with previous force field versions – is the current setup too helix-specific?
Docking simulations in presence of flexible loops, such as the hinge region of Casein Kinase 2 (3BQC) • – pose of ligand emodin and loop geometry are correctly predicted (3BQC not in FF training set). PDB, #1
Conclusion, Status&Needs… • We have working sampling & docking software, which must now be • Fine tuned • Helped to reduce the scope of search, by exploiting experimental (preferred rotamers, etc) or empirical knowledge (required key interactions, fingerprints, etc) • Also exploited in other 3D chemoinformatics approaches, with higher throughput than docking • Need our own CLUSTER (~100 nodes or mode) • Invest in existing platforms for privileged access • Buy own (with system manager) • One postdoc