1 / 35

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org/vlab/. Agenda. Introduction

pascal
Download Presentation

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WW Grid Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/

  2. Agenda • Introduction • Molecular Docking Application Needs • Virtual Lab Architecture • Grid Enabling CDB (chemical databases) • Application Composition • Scheduling Experiments • Conclusions

  3. Molecules Protein Drug Design: Data Intensive Computing on Grid • It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates. Chemical Databases (legacy, in .MOL2 format) [Collaboration with WEHI for Medical Science, Melbourne]

  4. Using Basic Job submission commands Do all yourself! (manually) Total Cost:$???

  5. Build Distributed Application & Scheduler Build App case by case basis Complicated Construction E.g., MPI based Total Cost:$???

  6. Rapid Parameterisation and Deployment Using the Gridbus and Nimrod-G Tools Compose, Submit, & Play!

  7. Chemical Databases (legacy, in .MOL2 format) Docking Application Requirements • It is compute intensive: • Each docking job can take few minutes to hours depending on the structural complexity. • It is data intensive: • The databases are huge (MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge! • CDBs are distributed. • It is a killer application for the Grid.

  8. DataGrid Brokering “Screen 2K molecules in 30min. for $10” Nimrod/G Computational Grid Broker Algorithm1 Data Replica Catalogue . . . CDB Broker AlgorithmN 3 “CDB replicas please?” “advise CDB source? 5 1 4 2 “process & send results” Grid Info. Service “selection & advise: use GSP4!” “Screen mol.5 please?” “Is GSP4 healthy?” 7 6 “mol.5 please?” CDB Service CDB Service GSP1 GSP2 GSPm GSP3(Grid Service Provider) GSP4 GSPn

  9. Software Tools • Molecular Modelling Application (DOCK) • Parameter Modelling Tools (Nimrod/enFusion) • Grid Resource Broker (Nimrod-G) • Data Grid Broker • Chemical DataBase (CDB) Management and Intelligent Access Tools • PDB databse Lookup/Index Table Generation. • PDB and associated index-table Replication. • PDB Replica Catalogue (that helps in Resource Discovery). • PDB Servers (that serve PDB clients requests). • PDB Brokering (Replica Selection). • PDB Clients for fetching Molecule Record (Data Movement). • Grid Middleware (Globus and GrACE) • Grid Fabric Management (Fork/LSF/Condor/Codine/…)

  10. Nimrod and Virtual Lab Tools [parametric programming language, GUI tools, and CDB indexer] Molecular Modelling for Drug Design CDB PDB The Virtual Lab. – Software Stack APPLICATIONS PROGRAMMING TOOLS USER LEVEL MIDDLEWARE Nimrod-G and CDB Data Broker [task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server] CORE MIDDLEWARE Globus [security, information, job submission] FABRIC Worldwide Grid [Distributed computers and databases with different Arch, OS, and local resource management systems]

  11. Grid Info Server Nimrod-G Grid Broker Task Farming Engine Grid Scheduler Grid Trade Server Grid Tools And Applications User Process Do this in 30 min. for $10? Nimrod Agent Local Resource Manager ProcessServer Grid Dispatcher Docking Process Get molecule “n” record from “abc” CDB CDB Server File Server File access Molecule “n” Location ? CDB Client Get mol. record . . . . . . . . . . . . Index and CDB1 CDBm CDB Service on Grid V-Lab Components Interaction User Node Grid Node Compute Node

  12. DOCK code*(Enhanced by WEHI, U of Melbourne) • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site. • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together. • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind. • So, why is it important to able to identify small molecules which may bind to a target macromolecule? • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug. • E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein! • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1 * Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/

  13. Molecule to be screened Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed 7 anchor_search no torsion_drive yes clash_overlap 0.5 conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes . . . . . . . . . . . . maximum_cycles 1 ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2

  14. Molecule to be screened 1. Parameterize Dock input file(use Nimrod Tools: GUI/language) score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search . . . . . . . . . . . . maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2

  15. 2. Create Docking Plan:Define Variable and their value parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter CDB_SERVER text default "bezek.dstc.monash.edu.au"; parameter CDB_PORT_NO text default "5001"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; . . . . . . . . . . . . parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 2000 step 1; Molecules to be screened

  16. Create Docking PlanFile3. Define Task that jobs need to do task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname endtask

  17. Gridbus Visual Tool for Parametric Application Creation (e.g., Docking)

  18. Chemical DataBase (CDB) • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases. • There is also the ability to screen virtual combinatorial databases, in their entirety. • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.

  19. Target Testcase • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia. • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.

  20. Docking Deployment on The World Wide Grid

  21. Scheduling Molecular Docking Application on Grid: Experiment • Workload – Docking 200 molecules with ECE • 200 jobs, each need in the order of 3 minute depending on molecule weight. • Deadline: 60 min. and budget: 50, 000 G$/tokens • Strategy: minimise time / cost • Execution Cost with cost optimisation • Optimise Cost: 14, 277(G$) (finished in 59.30 min.) • Optimise Time: 17, 702(G$) (finished in 34 min.) • In this experiment: Time-optimised scheduling costs extra 3.5K$ compared to that of Cost-optimised. • Users can now trade-off between Time Vs. Cost.

  22. WW Grid WWG Setup Australia North America GMonitor Melbourne+Monash U: VPAC, Physics ANL: SGI/Sun/SP2 NCSA: Cluster Wisc: PC/cluster NRC, Canada Many others Gridbus+Nimrod-G MEG Visualisation Solaris WS Internet Europe Grid MarketDirectory ZIB: T3E/Onyx AEI: Onyx CNR: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Cambridge: SGI Many others Asia AIST, Japan: Solaris Cluster Osaka University: Cluster Doshia: Linux cluster Korea: Linux cluster

  23. Resources Selected & Price/CPU-sec.

  24. DBC Scheduling for Time Optimization – No. of Jobs in Exec.

  25. DBC Scheduling for Cost Optimization – No. of Jobs in Exec.

  26. Summary and Conclusion • Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools. • Distributed Docking demonstrates that Nimrod-G and Gridbus tools: • Enable Grid application software engineering rapidly • Provide powerful runtime machinery for optimal deployment of applications on the Grid. • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board. • Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)

  27. Thanks http:/www.gridbus.org/vlab

  28. DBC Time Opt. Scheduling

  29. DBC Scheduling for Time Optimization – No. of Jobs Finished

  30. DBC Scheduling for Time Optimization – Budget Spent

  31. DBC Cost Opt. Scheduling

  32. DBC Scheduling for Cost Optimization – No. of Jobs Finished

  33. DBC Scheduling for Cost Optimization – Budget Spent

  34. Parametric Processing Parameters Magic Engine for Manufacturing Humans! Multiple Runs Same Program Multiple Data Killer Application for the Grid! Courtesy: Anand Natrajan, University of Virginia

More Related