250 likes | 622 Views
Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study. Rajkumar Buyya. Melbourne, Australia http://www.buyya.com/ecogrid. Grid. Economy Grid. Scheduling. Economics. Contents. Introduction Resource Management challenges Nimrod-G Toolkit
E N D
Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australiahttp://www.buyya.com/ecogrid
Grid EconomyGrid Scheduling Economics Contents • Introduction • Resource Management challenges • Nimrod-G Toolkit • SPMD/Parameter-Study Creation Tools • Grid enabling Drug Design Application • Nimrod-G Grid Resource Broker • Scheduling Experiments on World Wide Grid • Conclusions
A typical Grid environment and Players Resource Broker Application Resource Broker
Grid Characteristics • Heterogeneous • Resource Types: PC, WS, Clusters • Resource Architecture: CPU Arch, OS • Applications: CPU/IO/message intensive • Users and Owners Requirements • Access Price: different for different users, resources and time. • Availability: varies from time to time. • Distributed • Resources • Ownership • Users • Each have their own (private) policies and objectives. • Very much similar to heterogeneity and decentralization that is present in “human economies” (democratic and capitalist world). • Hence, we propose the use of “economics” as a metaphor for resource management and scheduling. It regulates supply and demand for resources and offers incentive for resource owners for contributing resources to the Grid.
Computational Economy Security Data locality Resource Allocation & Scheduling Uniform Access System Management Resource Discovery Network Management Grid Tools for Handling Application Development
Nimrod-G: Grid Resource Broker • A resource broker for managing, steering, and executing task farming (parametric sweep/SPMD model) applications on Grid based on deadline and computational economy. • Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability. • Key Features • A single window to manage & control experiment • Persistent and Programmable Task Farming Engine • Resource Discovery • Resource Trading • Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & results • Steering & data management • Accounting
Parametric Processing Parameters Magic Engine for Manufacturing Humans! Multiple Runs Same Program Multiple Data Killer Application for the Grid! Courtesy: Anand Natrajan, University of Virginia
Sample P-Sweep Applications Bioinformatics: Drug Design / Protein Modelling Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Sensitivityexperiments on smog formation Data Mining Electronic CAD: Field Programmable Gate Arrays High Energy Physics: Searching for Rare Events Computer Graphics: Ray Tracing Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Civil Engineering: Building Design Network Simulation Automobile: Crash Simulation Aerospace: Wing Design astrophysics
Virtual Drug Design: Data Intensive Computing on Grid • A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid. • It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. • In collaboration with: • Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI) http://www.csse.monash.edu.au/~rajkumar/vlab
Molecule to be screened Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed 7 anchor_search no torsion_drive yes clash_overlap 0.5 conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes . . . . . . . . . . . . maximum_cycles 1 ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
Molecule to be screened Parameterize Dock input file(use Nimrod Tools: GUI/language) score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search . . . . . . . . . . . . maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2
Create Dock PlanFile1. Define Variable and their value parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; parameter random_search text default "yes"; . . . . . . . . . . . . parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 2000 step 1; Molecules to be screened
Create Dock PlanFile2. Define Task that jobs need to do task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname endtask
Use Nimrod-G Submit & Play!
Cost Deadline Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains A Nimrod/G Monitor
Adaptive Scheduling Algorithms Discover More Resources Discover Resources Establish Rates Compose & Schedule Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Distribute Jobs
WW Grid Scheduling Experiment on World Wide Grid Testbed Cardiff/UK Portsmoth/UK TI-Tech/Tokyo ETL/Tsukuba AIST/Tsukuba ANL/Chicago USC-ISC/LA UTK/Tennessee UVa/Virginia Dartmouth/NH BU/Boston EUROPE: ZIB/Germany PC2/Germany AEI/Germany Lecce/Italy CNR/Italy Calabria/Italy Pozman/Poland Lund/Sweden CERN/Swiss Kasetsart/Bangkok Monash/Melbourne VPAC/Melbourne Santiago/Chile
Deadline and Budget Constrained Scheduling Experiment • Workload: • 165 jobs, each need 5 minute of CPU time • Deadline: 2 hrs. and budget: 396000 units • Strategy: minimise time / cost • Execution Cost with cost optimisation • Optimise Cost: 115200 (G$) (finished in 2hrs.) • Optimise Time: 237000 (G$) (finished in 1.25 hr.) • In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. • Users can now trade-off between Time Vs. Cost.
WW Grid WW Grid World Wide Grid (WWG) Australia North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster Monash Uni.: Nimrod/G Linux cluster Globus+Legion GRACE_TS Globus/Legion GRACE_TS Solaris WS Internet Asia/Japan Europe Tokyo I-Tech.: ETL, Tuskuba ZIB/FUB: T3E/Mosix Cardiff: Sun E6500 Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster Pozman: SGI/SP2 Linux cluster Globus + GRACE_TS Chile: Cluster Globus + GRACE_TS Globus + GRACE_TS South America
Conclusions • P2P and Grid Computing is emerging as a next generation computing platform for solving large scale problems through sharing of geographically distributed resources. • Resource management is a complex undertaking as systems need to be adaptive, scalable, competitive,…, and driven by QoS. • We proposed a framework based on “computational economies” and discussed several economic models for resource allocation and for regulating supply-and-demand for resources. • Scheduling experiments on World Wide Grid demonstrate our Nimrod-G broker ability to dynamically lease or rent services at runtime based on their quality, cost, and availability depending on consumers QoS requirements. • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board. • Economics paradigm for QoS driven resource management is essential to push P2P/Grids into mainstream computing!
Download Software & Information • Nimrod & Parameteric Computing: • http://www.csse.monash.edu.au/~davida/nimrod/ • Economy Grid & Nimrod/G: • http://www.buyya.com/ecogrid/ • Virtual Laboratory/Virtual Drug Design: • http://www.buyya.com/vlab/ • Grid Simulation (GridSim) Toolkit (Java based): • http://www.buyya.com/gridsim/ • World Wide Grid (WWG) testbed: • http://www.buyya.com/ecogrid/wwg/ • Looking for new volunteers to grow • Please contact me to barter your & our machines! • Want to build on our work/collaborate: • Talk to me now or email: rajkumar@csse.monash.edu.au