370 likes | 540 Views
Nimrod & NetSolve. Sathish Vadhiyar. Nimrod. Sources/Credits: Nimrod web site & papers. Background. For execution of parametric experiments across distributed computers User describes plan file that declares parameters
E N D
Nimrod & NetSolve Sathish Vadhiyar
Nimrod Sources/Credits: Nimrod web site & papers
Background • For execution of parametric experiments across distributed computers • User describes plan file that declares parameters • Parametric studies – range of different simulations calculated using the same program • Need for a Grid • 3 variables, 4 values – 64 experiments • Each experiment – several hours
Sample plan file parameter iseed integer range from 100 to 4000 step 100; parameter thick label "BUC thickness" float range from 1.1 to 2.0 step 0.1; parameter jseed integer compute thick*1000; task nodestart copy ccal.$OS node:./ccal copy dummy node:. copy ccal.dat node:. copy skel.inp node:. endtask task main node:substitute skel.inp ccal.inp node:execute ./ccal copy node:ccal.op ccalout.$jobname endtask
Phases of Computational Experiment • Experiment pre-processing, when data is set up for the experiment; • Execution pre-processing, when data is prepared for a particular execution; • Execution, when the program is executed for a given set of parameter values; • Execution post-processing, when data from a particular execution is reduced; • Experiment post-processing, when results are processed, for example by running data interpretation or visualization software.
Architecture • Components • Client • Parametric engine • Scheduler • Dispatcher • Job wrapper
Components • Parametric engine • Persistent job service • Interacts with the client, schedule advisor and dispatcher • Takes declarative plan from the user • Scheduler • Objectives – meet deadlines, minimize cost • Dispatcher • Starts remote component called job wrapper • Updates status of task to parametric engine • Job wrapper • Responsible for staging-in, execution and staging out
Cost Model • Cost / Priority matrix defined based on specification by resource providers • Nimrod/G scheduler performs discovery and allocation of resources based on specified execution times and cost constraints • Cost of experiment varies depending on the load
Scheduling Heuristic • Discovery • Initial filtering of resources based on cost specifications • Identification of lowest-cost set of resources able to meet deadlines • Allocations • Jobs allocated from the queue to the resources identified in step 1 • Monitoring • Completion time of jobs monitored • Execution rate established • Refinement • Execution rate used to update expected completion times of remaining jobs • Revisit steps 1 and 2
Ionization chamber calibration • Chamber response to front wall thickness • ion-pair = • 400 tasks • Each model involved about 40 minutes – 140 minutes • 3 experiments – 10-hr, 15-hr, 20-hr deadline
Experimental setup • 165 cpu jobs, each 5 min. in duration • Deadline – 2 hours • Budget - 396000 • 2 strategies: • Optimize time • Optimize cost
Scheduling • Adaptive scheduling algorithms • Time minimization and limited budget (etime optimal) • Time minimization and unlimited budget (etime highoptimal) • Cost minimization and limited by deadline (ecost optimal) • None minimization, limited time and cost (etime + ecost optimal)
Nimrod / O • Optimization of parameters to minimize objective function • Case study: optimize shape and angle of attack of airfoil that maximizes the lift to drag ratio • Design optimization problem • Objective function can be non-linear, contain noise, can be continuous or discrete • No single optimization algorithm can give the best result • Nimrod / O supports a range of algorithms
Contd … • Search algorithms • P-BFGS • Simplex • Divide-and-conquer • Simulated annealing
References • Abramson D, Lewis A, Peachey T, Fletcher, C., “An Automatic Design Optimization Tool and its Application to Computational Fluid Dynamics”, SuperComputing 2001, Denver, Nov 2001. • Abramson , D., Sosic , R., Giddy , J., Cope , M. "The Laboratory Bench: Distributed Computing for Parametised Simulations", 1994 Parallel Computing and Transputers Conference, Wollongong, Nov 94, pp 17 27. • Abramson D., Sosic R., Giddy J. and Hall B., "Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations", The 4th IEEE Symposium on High Performance Distributed Computing, Virginia, August 1995.
References • Abramson, D., Giddy, J. and Kotler, L. High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?, International Parallel and Distributed Processing Symposium (IPDPS), pp 520- 528, Cancun, Mexico, May 2000. • Buyya, R., Abramson, D. and Giddy, J. Nimrod/G: An Architecture of a Resource Management and Scheduling System in a Global Computational Grid, HPC Asia 2000, May 14-17, 2000, pp 283 289, Beijing, China. • Abramson, D, Buuya, R. and Giddy, J. “A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Broker”, Future Generation Computer Systems. Volume 18, Issue 8, Oct-2002. • Buyya, R., Giddy, J. and Abramson, D. "An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications", Workshop on Active Middleware Services (AMS 2000), (in conjuction with Ninth IEEE International Symposium on High Performance Distributed Computing), Kluwer Academic Press, August 1, 2000, Pittsburgh, USA.
Components • Generator • Input: plan file • Processes plan file, gives choices to the user regarding parameters • Output: run file (description of a job) • Dispatcher • Input: run file • Stages file to remote resources • Runs jobs on remote resources
Nimrod-G Architecture • Origin: • Implements scheduling and monitoring • Exists for the entire duration of the experiment • Responsible for execution of experiment within specified time and cost constraints • Client • User interacts with the Origin process through the client • Multiple clients can connect to a single origin process and monitor the same experiment
Nimrod Components • Nimrod Resource Broker • Origin process spawns NRB on the remote site • Interacts with GRAM • Capabilities beyond GRAM including file staging, creation of jobs and process control
experiments • 90-second jobs over 10 simulated queues with different access costs (Q1=10, Q2 = 12 etc.) • 100 jobs, 9000 seconds • 10 queues, 900 seconds optimal • Deadlines – 990, 1980, 2970 • Costs – 252000, 171000, 126000