350 likes | 510 Views
WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org/vlab/. Agenda. Introduction
E N D
WW Grid Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/
Agenda • Introduction • Molecular Docking Application Needs • Virtual Lab Architecture • Grid Enabling CDB (chemical databases) • Application Composition • Scheduling Experiments • Conclusions
Molecules Protein Drug Design: Data Intensive Computing on Grid • It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates. Chemical Databases (legacy, in .MOL2 format) [Collaboration with WEHI for Medical Science, Melbourne]
Using Basic Job submission commands Do all yourself! (manually) Total Cost:$???
Build Distributed Application & Scheduler Build App case by case basis Complicated Construction E.g., MPI based Total Cost:$???
Rapid Parameterisation and Deployment Using the Gridbus and Nimrod-G Tools Compose, Submit, & Play!
Chemical Databases (legacy, in .MOL2 format) Docking Application Requirements • It is compute intensive: • Each docking job can take few minutes to hours depending on the structural complexity. • It is data intensive: • The databases are huge (MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge! • CDBs are distributed. • It is a killer application for the Grid.
DataGrid Brokering “Screen 2K molecules in 30min. for $10” Nimrod/G Computational Grid Broker Algorithm1 Data Replica Catalogue . . . CDB Broker AlgorithmN 3 “CDB replicas please?” “advise CDB source? 5 1 4 2 “process & send results” Grid Info. Service “selection & advise: use GSP4!” “Screen mol.5 please?” “Is GSP4 healthy?” 7 6 “mol.5 please?” CDB Service CDB Service GSP1 GSP2 GSPm GSP3(Grid Service Provider) GSP4 GSPn
Software Tools • Molecular Modelling Application (DOCK) • Parameter Modelling Tools (Nimrod/enFusion) • Grid Resource Broker (Nimrod-G) • Data Grid Broker • Chemical DataBase (CDB) Management and Intelligent Access Tools • PDB databse Lookup/Index Table Generation. • PDB and associated index-table Replication. • PDB Replica Catalogue (that helps in Resource Discovery). • PDB Servers (that serve PDB clients requests). • PDB Brokering (Replica Selection). • PDB Clients for fetching Molecule Record (Data Movement). • Grid Middleware (Globus and GrACE) • Grid Fabric Management (Fork/LSF/Condor/Codine/…)
Nimrod and Virtual Lab Tools [parametric programming language, GUI tools, and CDB indexer] Molecular Modelling for Drug Design CDB PDB The Virtual Lab. – Software Stack APPLICATIONS PROGRAMMING TOOLS USER LEVEL MIDDLEWARE Nimrod-G and CDB Data Broker [task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server] CORE MIDDLEWARE Globus [security, information, job submission] FABRIC Worldwide Grid [Distributed computers and databases with different Arch, OS, and local resource management systems]
Grid Info Server Nimrod-G Grid Broker Task Farming Engine Grid Scheduler Grid Trade Server Grid Tools And Applications User Process Do this in 30 min. for $10? Nimrod Agent Local Resource Manager ProcessServer Grid Dispatcher Docking Process Get molecule “n” record from “abc” CDB CDB Server File Server File access Molecule “n” Location ? CDB Client Get mol. record . . . . . . . . . . . . Index and CDB1 CDBm CDB Service on Grid V-Lab Components Interaction User Node Grid Node Compute Node
DOCK code*(Enhanced by WEHI, U of Melbourne) • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site. • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together. • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind. • So, why is it important to able to identify small molecules which may bind to a target macromolecule? • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug. • E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein! • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1 * Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/
Molecule to be screened Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed 7 anchor_search no torsion_drive yes clash_overlap 0.5 conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes . . . . . . . . . . . . maximum_cycles 1 ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
Molecule to be screened 1. Parameterize Dock input file(use Nimrod Tools: GUI/language) score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search . . . . . . . . . . . . maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
2. Create Docking Plan:Define Variable and their value parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter CDB_SERVER text default "bezek.dstc.monash.edu.au"; parameter CDB_PORT_NO text default "5001"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; . . . . . . . . . . . . parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 2000 step 1; Molecules to be screened
Create Docking PlanFile3. Define Task that jobs need to do task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname endtask
Gridbus Visual Tool for Parametric Application Creation (e.g., Docking)
Chemical DataBase (CDB) • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases. • There is also the ability to screen virtual combinatorial databases, in their entirety. • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.
Target Testcase • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia. • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.
Scheduling Molecular Docking Application on Grid: Experiment • Workload – Docking 200 molecules with ECE • 200 jobs, each need in the order of 3 minute depending on molecule weight. • Deadline: 60 min. and budget: 50, 000 G$/tokens • Strategy: minimise time / cost • Execution Cost with cost optimisation • Optimise Cost: 14, 277(G$) (finished in 59.30 min.) • Optimise Time: 17, 702(G$) (finished in 34 min.) • In this experiment: Time-optimised scheduling costs extra 3.5K$ compared to that of Cost-optimised. • Users can now trade-off between Time Vs. Cost.
WW Grid WWG Setup Australia North America GMonitor Melbourne+Monash U: VPAC, Physics ANL: SGI/Sun/SP2 NCSA: Cluster Wisc: PC/cluster NRC, Canada Many others Gridbus+Nimrod-G MEG Visualisation Solaris WS Internet Europe Grid MarketDirectory ZIB: T3E/Onyx AEI: Onyx CNR: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Cambridge: SGI Many others Asia AIST, Japan: Solaris Cluster Osaka University: Cluster Doshia: Linux cluster Korea: Linux cluster
Summary and Conclusion • Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools. • Distributed Docking demonstrates that Nimrod-G and Gridbus tools: • Enable Grid application software engineering rapidly • Provide powerful runtime machinery for optimal deployment of applications on the Grid. • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board. • Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)
Thanks http:/www.gridbus.org/vlab
Parametric Processing Parameters Magic Engine for Manufacturing Humans! Multiple Runs Same Program Multiple Data Killer Application for the Grid! Courtesy: Anand Natrajan, University of Virginia