710 likes | 877 Views
WW Grid. Virtual Laboratory: Data Intensive Science during Holiday @ Robinson Village in Italy!. Rajkumar Buyya. Melbourne, Australia www.buyya.com/ecogrid. Grid Warning!. This is a science fiction story on the future of grid computing All actors mentioned in this talk are
E N D
WW Grid Virtual Laboratory:Data Intensive Science during Holiday @ Robinson Village in Italy! Rajkumar Buyya Melbourne, Australiawww.buyya.com/ecogrid
Grid Warning! • This is a science fiction story on the future of grid computing • All actors mentioned in this talk are • Application under consideration is fictitious. • Prof.Watson-II is researching on drug design. • The complete story is fictitious except the Grid technology!
Prof. Watson-II Spends all his time in Lab @ University of Lecce
Watson-II’s wife was Unhappy • Since he was not all spending any time with her & kids. • Everyday he goes to lab @ 8am and comes backs to home at 11pm night. • After few day he and his wife had a big fight @ Home: • She gives him warning: If he does not come home tomorrow by 6pm, he will have to face life time consequence.
Prof. Watson-II works upto 5pm in Lab @ University of Lecce Goes to Work @ 9am Returns to home by 5.30PM!
Prof. Watson-II works up to 5pm in Lab @ University of Lecce Goes to Work @ 9am Returns to home by 5.30PM!
Watson-II promises his wife that he will soon take her for a holiday @ Robinson Village
Prof. Watson-II hires assistant and works smarter! Goes to Work @ 9am Returns to home by 5.30PM!
Watson-II quickly reads news clipping that he got from Grid researcher
Goes to Internet Room & does some surfacing of Grid researcher page
Drug Design: Data Intensive Computing on Grid • A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid. • It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. • In collaboration with: • Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI) http://www.csse.monash.edu.au/~rajkumar/dd@home/
GTS GTS GTS GTS DesignDrug@Home ArchitectureA Virtual Lab for “Molecular Modeling for Drug Design” on P2P Grid Grid Info. Service Grid Market Directory Data Replica Catalogue “Give me list PDBs sources Of type aldrich_300?” “service cost?” “service providers?” GTS Resource Broker “Screen 2K molecules in 30min. for $10” “mol.5 please?” (RB maps suitable Grid nodes and Protein DataBank) “get mol.10 from pdb1 & screen it.” PDB2 “mol.10 please?” (GTS - Grid Trade Server) PDB1
Software Tools • Molecular Modelling Tools (DOCK) • Parameter Modelling Tools (Nimrod/enFusion) • Grid Resource Broker (Nimrod-G) • Data Grid Broker • Protein Data Bank (PDB) Management and Intelligent Access Tools • PDB databse Lookup/Index Table Generation. • PDB and associated index-table Replication. • PDB Replica Catalogue (that helps in Resource Discovery). • PDB Servers (that serve PDB clients requests). • PDB Brokering (Replica Selection). • PDB Clients for fetching Molecule Record (Data Movement). • Grid Middleware (Globus and GrACE) • Grid Fabric Management (Fork/LSF/Condor/Codine/…)
DOCK code*(Enhanced by WEHI, U of Melbourne) • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site. • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together. • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind. • So, why is it important to able to identify small molecules which may bind to a target macromolecule? • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug. • Thus disabling the ability of (HIV) virus attaching itself to molecule/protein! • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1 * Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/
Molecule to be screened Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed 7 anchor_search no torsion_drive yes clash_overlap 0.5 conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes . . . . . . . . . . . . maximum_cycles 1 ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
Molecule to be screened Parameterized Dock input file score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search . . . . . . . . . . . . maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
Dock PlanFile (contd.) parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; parameter random_search text default "yes"; . . . . . . . . . . . . parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 200 step 1; Molecules to be screened
Dock PlanFile task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname endtask
Docking Experiment Preparation • Setup PDB DataGrid • Index PDB databases • Pre-stage (all) Protein Data Bank (PDB) on replica sites • Start PDB Server • Create Docking GridScore (receptor surface details) for a given receptor on home node. • Pre-Staging Large Files required for Docking: • Pre-stage Dock executables and PDB access client on Grid nodes, if required (e.g., dock.Linux, dock.SunOS, dock.IRIX64, and dock.OSF1 on Linux, Sun, SGI, and Compaq machines respectively). Use globus-rcp. • Pre-stage/Cache all data files (~3-13MB each) representing receptor details on Grid nodes. • This can can be done demand by Nimrod/G for each job, but few input files are too large and they are required for all jobs). So, pre-staging/caching at http-cache or broker level is necessary to avoid the overhead of copying the same input files again and again!
Protein Data Bank • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases. • There is also the ability to screen virtual combinatorial databases, in their entirety. • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.
Target Testcase • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia. • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.
Resource Brokering Architecture for Molecular Screening on World Wide Grid “Screen 2K molecules in 30min. for $10” Nimrod/G Computational Grid Broker Algorithm1 Data Replica Catalogue . . . PDB Broker AlgorithmN 3 “PDB replicas please?” “advise PDB source? 5 1 4 2 “process & send results” Grid Info. Service “selection & advise: use GSP4!” “Screen mol.5 please?” “Is GSP4 healthy?” 7 6 “mol.5 please?” PDB2 PDB Service PDB Service GSP1 GSP2 GSPm GSP3(Grid Service Provider) GSP4 GSPn
Watson-II again saw Grid researcher on beach and asks him a favor! Can I borrow your Grid identity for 2 days ? Nice Grid Researcher Trusts Watson & Gives him “his Grid identity” including access to his World Wide Grid testbed! Grid Trust on the Beach!
Connects to his U.Lecce lab machine and copies all protein samples he prepared before taking holiday
Copies Test experiment of Grid researcher & modifies it to use his lab experiment data.
Starts Molecular Experimentation “Screen 50K molecules in 120min. for $200” Nimrod/G Computational Grid Broker Algorithm1 Data Replica Catalogue . . . PDB Broker AlgorithmN 3 “PDB replicas please?” 2 5 1 4 “advise PDB source? Grid Info. Service “use GSP4!” “Screen mol.5 please?” “Is GSP4 healthy?” 6 “mol.5 please?” PDB2 PDB Service PDB Service GSP1 GSP2 GSPm GSP3 GSP4 GSPn
Comes back to Internet room after 2 hours and asks his assistant to test results
Watson-II assistant conducts tests afternoon ? Sends email to Wantson in the evening: “looks like our client is improving…”
Watson-II does some more exploration: this time with one million molecules. Asks Nimrod to email results to his assistant for testing...