190 likes | 205 Views
Explore a complete 16-stage workflow for in silico docking on the Plasmodium falciparum ZINC compound database, targeting key enzymes. Understand the workflow, software tools, metadata management, and data processing involved to estimate inhibitor potential accurately.
E N D
Docking and molecular dynamics – complete 16 stage workflow with gLite WMS Astrid Maaß, Jisamma Kallumadikal Fraunhofer Institute SCAI, St. Augustin, Germany
plasmepsin Application Background Wide in Silico Docking on Malaria ZINC compound db 500.000 docking Plasmodium falciparum in human red blood cells plm hits 5000 Targets • P.f. PLM • P.f. GST • P.f. DHFR • P.f. Tub • P.v. DHFR Further analysis {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Motivation • Docking information • binding mode drug design • Estimate of inhibitory potential • Pro: • reduce number of candidate molecules rapidly! • Contra: • results may be inaccurate in detail • For promising candidates: • combine docking with more realistic scoring function: All-in-one workflow = FlexX + Amber • gain accuracy, maintain all information! {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
R≤1 calc exp Reference Data Quality of predictions: correlation with exp. affinities/average rmsd (quality of placements) rmsd ≤ 1.5Å {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Preliminary Results • Current status: • Test run completed successfully for avidin with default settings: R: 0.51 → 0.71, rmsd: 2.91Å → 2.51 Å • reranking in progress ZINC01073995 {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Workflow The workflow consists of 16 stages, • where each stage depends on the previous stage • executes the following softwares • FlexX 2.0.0 • Commercial product • Licence manager FLEXlm is used • Limited access control • Amber 8 (Sander & Tleap + Elan) • Convert (inhouse development) • Cluster (inhouse development) … FlexX Tleap Cluster Convert Sander {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Intermediate Job Meta-data management Input XML Software Applications gLite UI gLite CE gLite WMS Storage Element EGEE Infrastruture Architecture • Components {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Storage Element Computing Element Middleware • gLite 3.0.2 • gLite WMS provides the feature ‚DAG‘ Job • Procedures become nodes of the dag • FlexX, Tleap, Sander, Convert, Cluster and Intermediate • Dependency feature • The output of the previous job will go as input to the next job • Each procedure is responsible for upload and download of its outputs and inputs, from an appropriate storage element • Distributed Data Management • Available to access data irrespective of the global location of the sites where the jobs are running {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Workflow • The ‚BIO‘ Workflow • Flexx writes (conformations) are unknown • Cluster reduces the no: of conformations • Output of each stage is unpredictable • Dynamic distribution of conformations required • Organisation of the huge quantity of produced data {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Processing • HOW IT WORKS (Grid) {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Workflow • The ‚All in One‘ Workflow • The intermediate job provides dynamic division of the jobs • Higher performance • Organisation of Data Goes through each Ligand Keeps a count of the generated conformation Divide this count with the nodes Packs the input Register the input on the Grid (LFC) {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
24 25 24 24 25 24 24 Workflow • The ‚All in One‘ Workflow • Subjobs: 9 (Reserved) • Input: 1 Protein, 10 Ligands • Flexx write: 25 (varies) • Dynamic ‚Bio‘ Workflow is combined with the gLite feature ‚DAG‘ • Nodes are equally loaded {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
User Input • XML (User input) • Experiment Name • VO name • Middleware • SE, LFC • Proteins, Ligands, Sites & Receptor • Job Information • Job Type • Subjobs required (approximate) • Input Script (batch files) {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Response scaimeta.scai.fraunhofer.de Invokes the web service(Request) Monitoring / Meta-data Management • Meta-data Management • Invokes the handle „http://scaimeta.scai.fraunhofer.de/ogsa/services/ogsadai/flemm“ • Runs the query • Select, Insert or Update (Id, JobId, NodeName, JobType, InputLfn, OutputLfn, Destination, Hostname, StartTime, StopTime, CPUTime, SizeOfInput, SizeOfOutput, SizeOfApps, SizeOfSandboxStart, SizeOfSandboxStop) • Returns the response of the request (Result Set) {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Estimation • STAGES • Considering 10 ligands (20 conformations) as input, • The number of subjobs (nodes) for each stage be 5 • The dag consists of 80 nodes • 16 stages * 5 = 80 • Each node will get 2 ligands • 10 ligands / 5 nodes = 2 • The approximate time required for a workflow to complete the process is15 hours • without getting stuck up in long queues or failed sites • without any application / software failures • without any grid problems • Achieved on DECH VO by extending the proxy limit to 72 hours {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Estimation Job Name Job Input/job Appr.time/L Appr.time/job ----------------------------------------------------------------------------------------------------- Flexx 5 2L 3 min 6 min Tleap 5 2L 15 min 30 min Sander(Min1) 5 2L 21 min 42 min Convert 5 2L 42 sec 2 min Cluster 5 2L 6 sec 12 sec Sander(Min2) 5 2L 3.5 min 7 min Sander(Md1) 5 2L 1.5 hrs 3 hrs Sander(Md2) 5 2L 3.8 hr 8 hrs Convert 5 2L 6 sec 12 sec Tleap2 5 2L 9 min 18 min Sander(Min3) 5 2L 6 sec 12 sec Convert 5 2L 12 sec 24 sec Tleap3 5 2L 8 min 16 min Sander(Min4) 5 2L 7 min 14 min Tleap4 5 2L 6 min 12 min Sander(Min4) 5 2L 6 min 12 min {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Limitation • Time Constraint (Biomed) • Limited Proxy (24 hrs) • To successfully run the workflow we have split it into 3 parts and submitted one after the other to finish it within the available time frame • Successfully run on DECH VO • Proxy was extented to 72 hours • gLite updating creates problems which require the restart of WMS which aborts the running jobs • Updates are often not tested for complex jobs • Long Black List • Jobs are forwarded to sites which are having queues of length 4444 (to failed sites). These jobs are getting scheduled for a long time • Jobs are getting aborted due to the following errors • Job got an error while in the CondorG queue. • File not available.Cannot read JobWrapper output, both from Condor and from Maradona. • Got a job held event, reason: Repeated submit attempts (GAHP reports:) • 93 the gatekeeper failed to find the requested service • Got a job held event, reason: Unspecified gridmanager error • unable to register job – system load too high {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Challenges • Developing a GUI • for the ease of the user • Integrating gLite API • Automatic restart of aborted jobs • Control of FlexX licences {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de
Thank You Thanks to... • the EGEE-team at SCAI: • Jiri Kraus • Andre Gemünd • Christoph Lenzen • Horst Schwichtenberg • Klaere Cassirer • the BiosolveIT for providing FlexX licences {astrid.maass, jisamma.kallumadikal}@scai.fraunhofer.de