1 / 16

Running Batch Jobs in R: How to deal with coarsely parallel problems

Running Batch Jobs in R: How to deal with coarsely parallel problems. Malcolm Haddon. May 2014. Wealth from Oceans National research Flagship. Computer Intensive. Many, many, many iterations: Management Strategy Evaluation Monte Carlo Markov Chains Lots of replicates of any analyses

early
Download Presentation

Running Batch Jobs in R: How to deal with coarsely parallel problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running Batch Jobs in R:How to deal with coarsely parallel problems Malcolm Haddon May 2014 Wealth from Oceans National research Flagship

  2. Computer Intensive • Many, many, many iterations: • Management Strategy Evaluation • Monte Carlo Markov Chains • Lots of replicates of any analyses • Large scale simulations: • multi-species, • multi-populations, • multi-’etc’ • Any computing job that takes a long time or uses a lot of computing resources | Batch Jobs in R | Haddon

  3. Why the Fuss? • Solving BIG computing problems has its own strategies. • If a job: • takes a very long time, or • uses very large amounts of RAM • Then how can it be split up most effectively? • Depends on the scale at which processes are independent. • May need trials to find best compromise. | Batch Jobs in R | Haddon

  4. Coarsely Parallel Processes • Not talking about finely parallel processes such as cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is ideally suited to such analyses. • Some emphasis on this with the CSIRO clusters, (Bragg, etc) and the Advanced Scientific Computing program • Instead: focussed on serial and sequential problems where analysis order is important. • Population processes • Many biological processes • Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel) | Batch Jobs in R | Haddon

  5. Alternative Approaches to Simulation. Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) Split the job into 8 parts for (iter in 1:1000) { } ….. for (iter in 1:1000) { } for (iter in 1:1000) { } for (HS in 1:8) { for (iter in 1:1000) { } } ….. Store Results Store Results Store Results Combine plot and tabulate results plot and tabulate results Next Steps | Batch Jobs in R | Haddon

  6. The R program | Batch Jobs in R | Haddon

  7. setwd resultdir read in Data batchsimab.r source(“Constants”) source(“run_specification”) source(“Lots of Functions”) write to csv file(s) write to Rdata files plots to tiff/pdf/etc | Batch Jobs in R | Haddon

  8. Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## runspecification.Rand constants.R >wkdir<- "C:/A_CSIRO/Rcode/abalone/SimAb" >setwd(wkdir) ## points to directory containing batchsimab.r >command <- "R.exe --vanilla < “batchsimab.R" >shell(command, wait=FALSE) ##(R.exe must be on the path). | Batch Jobs in R | Haddon

  9. Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## RunSpecification.Rand constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE) } ## Can re-write values in RunSpecification.R | Batch Jobs in R | Haddon

  10. pickLML <- c(127,132,138,145) • for (pick in 1:length(pickLML)) { • filename <- "alt_runspecification.r" • sink(filename) • cat("##Select the HCR \n") • cat("StepH <- FALSE \n") • cat("ConstH <- TRUE \n") • cat("## Define the Scenarios \n") • cat("initDepl_L <- c(0.7) \n") • cat("inH_L <- c(0.1) \n") • cat("origTAC <- 150.0 \n") • cat(paste("LML <- ",pickLML[pick],sep="") ," \n") • cat("reps <- 100 \n") • sink() • command <- "R.exe --vanilla < batchsimab.R" • shell(command, wait=FALSE) • Sys.sleep(5.0) • } | Batch Jobs in R | Haddon

  11. alt-runspecification.r - contents • batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE • ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1) • origTAC <- 150.0 • LML <- 138 • reps <- 100 | Batch Jobs in R | Haddon

  12. Alternative Approach Not that useful for coarsely parallel problems, but excellent for finely parallel processes. | Batch Jobs in R | Haddon

  13. Alternative Approaches • Can use one’s own desktop or laptop. • Can use a secondary machine (remote login) • Can use a CSIRO cluster machine (bragg for Linux or bragg-w for windows, plus others). • Clusters are very effective for finely parallel work but less so for coarsely parallel jobs. • Can use Condor – harvests CPU time on remote machines on network automatically. • wiki.csiro.au/display/ASC/Scientific+Computing+Homepage | Batch Jobs in R | Haddon

  14. Conclusion • The use of batch jobs provides a solution for completing certain types of task. • If you are using computer intensive methods then you might gain greatly from using coarsely parallel methods. • Trade-off between the benefits and the set-up time and post-run processing determines when it becomes sensible to use coarsely parallel methods • Invariably more than 1 way exists to do the same thing: • https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage | Batch Jobs in R | Haddon

  15. CSIRO Marine and Atmospheric Research Malcolm Haddon tel. 61 3 6232 5097 email. Malcolm.Haddon@csiro.au web. www.csiro.au Thank you Wealth from Oceans National research Flagship

  16. Adding in R.exe to Path • Control Panel • System • Advanced System Settings • Environmental Variables • PATH - edit • Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit. | Batch Jobs in R | Haddon

More Related