160 likes | 252 Views
Running Batch Jobs in R: How to deal with coarsely parallel problems. Malcolm Haddon. May 2014. Wealth from Oceans National research Flagship. Computer Intensive. Many, many, many iterations: Management Strategy Evaluation Monte Carlo Markov Chains Lots of replicates of any analyses
E N D
Running Batch Jobs in R:How to deal with coarsely parallel problems Malcolm Haddon May 2014 Wealth from Oceans National research Flagship
Computer Intensive • Many, many, many iterations: • Management Strategy Evaluation • Monte Carlo Markov Chains • Lots of replicates of any analyses • Large scale simulations: • multi-species, • multi-populations, • multi-’etc’ • Any computing job that takes a long time or uses a lot of computing resources | Batch Jobs in R | Haddon
Why the Fuss? • Solving BIG computing problems has its own strategies. • If a job: • takes a very long time, or • uses very large amounts of RAM • Then how can it be split up most effectively? • Depends on the scale at which processes are independent. • May need trials to find best compromise. | Batch Jobs in R | Haddon
Coarsely Parallel Processes • Not talking about finely parallel processes such as cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is ideally suited to such analyses. • Some emphasis on this with the CSIRO clusters, (Bragg, etc) and the Advanced Scientific Computing program • Instead: focussed on serial and sequential problems where analysis order is important. • Population processes • Many biological processes • Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel) | Batch Jobs in R | Haddon
Alternative Approaches to Simulation. Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) Split the job into 8 parts for (iter in 1:1000) { } ….. for (iter in 1:1000) { } for (iter in 1:1000) { } for (HS in 1:8) { for (iter in 1:1000) { } } ….. Store Results Store Results Store Results Combine plot and tabulate results plot and tabulate results Next Steps | Batch Jobs in R | Haddon
The R program | Batch Jobs in R | Haddon
setwd resultdir read in Data batchsimab.r source(“Constants”) source(“run_specification”) source(“Lots of Functions”) write to csv file(s) write to Rdata files plots to tiff/pdf/etc | Batch Jobs in R | Haddon
Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## runspecification.Rand constants.R >wkdir<- "C:/A_CSIRO/Rcode/abalone/SimAb" >setwd(wkdir) ## points to directory containing batchsimab.r >command <- "R.exe --vanilla < “batchsimab.R" >shell(command, wait=FALSE) ##(R.exe must be on the path). | Batch Jobs in R | Haddon
Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## RunSpecification.Rand constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE) } ## Can re-write values in RunSpecification.R | Batch Jobs in R | Haddon
pickLML <- c(127,132,138,145) • for (pick in 1:length(pickLML)) { • filename <- "alt_runspecification.r" • sink(filename) • cat("##Select the HCR \n") • cat("StepH <- FALSE \n") • cat("ConstH <- TRUE \n") • cat("## Define the Scenarios \n") • cat("initDepl_L <- c(0.7) \n") • cat("inH_L <- c(0.1) \n") • cat("origTAC <- 150.0 \n") • cat(paste("LML <- ",pickLML[pick],sep="") ," \n") • cat("reps <- 100 \n") • sink() • command <- "R.exe --vanilla < batchsimab.R" • shell(command, wait=FALSE) • Sys.sleep(5.0) • } | Batch Jobs in R | Haddon
alt-runspecification.r - contents • batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE • ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1) • origTAC <- 150.0 • LML <- 138 • reps <- 100 | Batch Jobs in R | Haddon
Alternative Approach Not that useful for coarsely parallel problems, but excellent for finely parallel processes. | Batch Jobs in R | Haddon
Alternative Approaches • Can use one’s own desktop or laptop. • Can use a secondary machine (remote login) • Can use a CSIRO cluster machine (bragg for Linux or bragg-w for windows, plus others). • Clusters are very effective for finely parallel work but less so for coarsely parallel jobs. • Can use Condor – harvests CPU time on remote machines on network automatically. • wiki.csiro.au/display/ASC/Scientific+Computing+Homepage | Batch Jobs in R | Haddon
Conclusion • The use of batch jobs provides a solution for completing certain types of task. • If you are using computer intensive methods then you might gain greatly from using coarsely parallel methods. • Trade-off between the benefits and the set-up time and post-run processing determines when it becomes sensible to use coarsely parallel methods • Invariably more than 1 way exists to do the same thing: • https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage | Batch Jobs in R | Haddon
CSIRO Marine and Atmospheric Research Malcolm Haddon tel. 61 3 6232 5097 email. Malcolm.Haddon@csiro.au web. www.csiro.au Thank you Wealth from Oceans National research Flagship
Adding in R.exe to Path • Control Panel • System • Advanced System Settings • Environmental Variables • PATH - edit • Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit. | Batch Jobs in R | Haddon