Harnessing the Power of High Performance Computing for R

Stanford Research Computing Center research-computing-support@stanford.edu Harnessing the Power of High Performance Computing for R Zhiyong Zhang zyzhang@stanford.edu

Shared memory v.s. distributed memory

R: HPC Matters Most R packages use a single core. Vectorization and other optimizations in R Interfaces for Compiled code Profiling tools: memory and CPU Large memory and out-of-memory data Parallelism for R Threads or shared memory: limited to one node implicit: minor changes (if at all) required Explicit: some assembly required MPI/Sockets or distributed memory: one or more nodes (explicit only) R with accelerator: Xeon Phi Coprocessors GPU accelerators

R in parallel: R basics .Rprofile/.Renviron file.path(getwd(),".Rprofile") R_LIBS $R_HOME $R_HOME_DIR $R_LIBS_USER [zyzhang@sherlock-ln02 ~]$ R RHOME /share/sw/free/R/3.2.0/lib64/R [zyzhang@sherlock-ln02~]$ Rscript script_name.R [zyzhang@sherlock-ln02~]$ R CMD BATCH script_name.R [zyzhang@sherlock-ln02 ~]$ R > source(“script_name”) > install.packages(“pkg”) > remove.packages(“pkg”) > help(solve) > example(solve) > ?<command-name> or ??<command-name> > ls(pos = "package:Rmpi") > .packages() > path.package("Rmpi") > installed.packages() > .libPaths() > .Library

Useful R commands and variables help(fctn) displays help on any function fctn, as in Python. To invoke complex arithmetic, add 0i to a number. For example, sqrt(-1) returns NaN, but sqrt(-1 + 0i) returns 0 + 1i. sessionInfo() prints the R version, OS, packages loaded, etc. ls() shows which objects are defined. rm(list=ls()) clears all defined objects. dev.new() opens a new plotting window. The function sort() does not change its argument. Distribution function prefixes d, p, q, r: stand for density (PDF), probability ( CDF), quantile (CDF-1), and random sample. dnorm is the density function of a normal random variable and rnorm generates a sample from a normal random variable. dunif and runif: corresponding functions for a uniform random variable.

Useful R commands and variables Assignment e <- m*c^2. m*c^2 -> e. Must use = for default function arguments or named arguments. Not reserved but better treated as: c, q, s, t, C, D, F, I, and T. Comments Comments begin with # and continue to the end of the line, as in Python or Perl. Missing values and NaN NaN, “not a number.”: overflows, undefined operation such as dividing by zero. NA, “not applicable.” missing data or NULL value. NA is a legal value inside an R vector and vectors received by R functions may contain non-null components. R functions must handle NA values. is.nan returns TRUE for NaN. is.na return true for NA or NaN.

Useful R commands and variables Vectors A vector in R is a container vector, an ordered collection of numbers. The length of a vector is the number of elements in the container. Operations are applied componentwise. R allows adding two vectors of different lengths: elements of the shorter summand are recycled as often as necessary to create a vector the length of the longer summand. Scalars are vectors of length 1 Vectors are created using the c function: p <- c(2,3,5,7) Vectors in R are indexed starting with 1 and matrices are in column-major order. Elements of a vector can be accessed using []. Vectors expand when assigning to indexes past the end of the vector. Negative indices x[-n] returns a copy of x with the nth element removed. Boolean values can also be used as indices Five kinds of subscripts in R. The type of a vector is the type of the elements it contains. All elements of a vector must have the same underlying type: logical, integer, double, complex, character, or raw.

Useful R commands and variables Lists Lists are like vectors, except elements need not all have the same type: integer, string, or a vector of Boolean values. Lists are created using the list function. Elements can be access by position using [[]]. Named elements may be accessed either by position or by name. Named elements of lists act like C structs, except a dollar sign rather than a dot is used to access elements. a <- list(name="Joe", 4, foo=c(3,8,9)) Now a[[1]] and a$name both equal the string “Joe”. Access a non-existent element of a list, say a[[4]] above, generates an error. Assignment to a non-existent element of a list extends the list. If the index assigned to is more than one past the end of the list, intermediate elements are created and assigned NULL values. Assignment to non-existent named fields, such as saying a$baz = TRUE.

Useful R commands and variables Sequences seq(a, b, n) creates a closed interval from a to b in steps of size n. seq(a, b, length=n): sequence with n points and step size (b-a)/(n-1). seq(1, 10, 3) returns the vector containing 1, 4, 7, and 10. The step size argument n defaults to 1 in both R and Python. a:b is an abbreviation for seq(a, b, 1). Matrices Matrices in R are vectors of multiple dimensions m <- array( c(1,2,3,4,5,6), dim=c(2,3) ) creates a matrix m. R fills matrices by column, the first row of m has elements 1, 3, and 5. To fill m by row, m <- array( c(1,2,3,4,5,6), dim=c(2,3), by.row = TRUE).

Useful R commands and variables Type conversion functions as.xxx converts its argument to type xxx. as.integer(3.2) returns the integer 3 as.character(3.2) returns the string “3.2”. Boolean operators T or TRUE for true values F or FALSE for false values. & and | apply element-wise on vectors. && and || are often used in conditional statements && and || use lazy evaluation as in C: the operators will not evaluate their second argument if the return value is determined by the first argument.

Useful R commands and variables Functions f <- function(a=10, b) { return (a+b) } The function function returns a function, which could be assigned to a. Function statement can be used to create an anonymous function (lambda expression). return is a function and its argument must be contained in parentheses. The use of return is optional; otherwise the value of the last line executed is returned. A default value need not be a static type but could be a function of other arguments. Arguments are passed by value. Variables by reference: (1) non-local variables variables in the calling routine’s scope, or (2) pass in all needed values and return a list as output. Scope: R uses lexical scoping while S-PLUS uses static scope. Since variables cannot be declared — they pop into existence on first assignment — it is not always easy to determine the scope of a variable. You cannot tell just by looking at the source code of a function whether a variable is local to that function.

R: memory profiling http://adv-r.had.co.nz/memory.html https://cran.r-project.org/doc/manuals/R-exts.html#Profiling-R-code-for-memory-use Memory management: predict how much memory is needed and to make the most of the memory. write faster code because accidental copies are a major cause of slow code. basics of memory management: objects, functions, larger blocks of code. size of an object: base size of each object is 40B: metadata, two pointers to previous and next data, one pointer to attributes, length, and padding allocates vectors that are 8, 16, 32, 48, 64, or 128 bytes long Beyond 128 bytes, R ask for memory in multiples of 8 bytes. Shared objects: shared objects are stored once Global string pool: unique strings are stored in one place

R: memory profiling #install.packages("ggplot2"); #install.packages("pryr") require(ggplot2); require(pryr) sizes <- sapply(0:50, function(n) object_size(seq_len(n))) plot(0:50, sizes, xlab = "Length", ylab = "Size (bytes)", type = "s") plot(0:50, sizes - 40, xlab = "Length", ylab = "Bytes excluding overhead", type = "n") abline(h = 0, col = "grey80") abline(h = c(8, 16, 32, 48, 64, 128), col = "grey80") abline(a = 0, b = 4, col = "grey90", lwd = 4) lines(sizes - 40, type = "s") x1 <- 1:1e6 y1 <- list(1:1e6, 1:1e6, 1:1e6) object_size(x1); object_size(y1) object_size("banana"); object_size(rep("banana", 10))

R: memory profiling mem_used(); mem_change() mem_change(x <- 1:1e6); mem_change(rm(x)); mem_change(NULL) gc(): R does garbage collection automatically Memory leaks: formulas and enclosures f1 <- function() { x <- 1:1e6; 10} mem_change(x <- f1()) object_size(x) f2 <- function() { x <- 1:1e6; a ~ b} mem_change(y <- f2()) object_size(y) f3 <- function() { x <- 1:1e6; function() 10} mem_change(z <- f3()) object_size(z)

R: memory profiling with lineprof require(devtools); require(lineprof); #devtools::install_github("hadley/lineprof") read_delim <- function(file, header = TRUE, sep = ",") { first <- scan(file, what = character(1), nlines = 1, sep = sep, quiet = TRUE) p <- length(first) all <- scan(file, what = as.list(rep("character", p)), sep = sep, skip = if (header) 1 else 0, quiet = TRUE) all[] <- lapply(all, type.convert, as.is = TRUE) if (header) { names(all) <- first } else { names(all) <- paste0("V", seq_along(all)) } as.data.frame(all)} library(ggplot2) write.csv(diamonds, "diamonds.csv", row.names = FALSE) source("read-delim.r") prof <- lineprof(read_delim("diamonds.csv")) #shine(prof) prof

Speed up R code: Rprof system.time({n=10^7; data1=rnorm(n); data2=rnorm(10^7); for(i in 1:n-1){data1[i]=data1[i]+data1[i+1]; data1[i]=exp(data1[i]^2)}; data1=data1*data2; matrix.data1=matrix(data1,nrow=100,byrow=TRUE); matrix.data2=matrix(data2,nrow=100,byrow=TRUE); almost.final.result=matrix.data1%*%t(matrix.data2); final.result=solve(almost.final.result)})[] Rprof("profiling.out") n=10^7; data1=rnorm(n); data2=rnorm(10^7); for(i in 1:n-1){data1[i]=data1[i]+data1[i+1]; data1[i]=exp(data1[i]^2)}; data1=data1*data2; matrix.data1=matrix(data1,nrow=100,byrow=TRUE); matrix.data2=matrix(data2,nrow=100,byrow=TRUE); almost.final.result=matrix.data1%*%t(matrix.data2); final.result=solve(almost.final.result) Rprof()

Speed up R code: Rprof > summaryRprof("profiling.out") $by.self self.time self.pct total.time total.pct "rnorm" 1.36 35.79 1.36 35.79 "%*%" 0.62 16.32 0.62 16.32 "exp" 0.56 14.74 0.56 14.74 "^" 0.40 10.53 0.40 10.53 "+" 0.38 10.00 0.38 10.00 "matrix" 0.28 7.37 0.28 7.37 ":" 0.06 1.58 0.06 1.58 "t.default" 0.06 1.58 0.06 1.58 "-" 0.04 1.05 0.04 1.05 "*" 0.04 1.05 0.04 1.05

Speed up R code: Rprof line_RProf_function=function() { data1=rnorm(10^7) for(i in 1:(length(data)-1)) { data1[i]=data1[i]+data1[i+1] data1[i]=exp(data1[i]^2) } data2=rnorm(10^7) data1=data2*data1 matrix.data1=matrix(data1, nrow=100, byrow=TRUE) matrix.data2=matrix(data2, nrow=100, byrow=TRUE) almost.final.result=matrix.data1%*%t(matrix.data2) final.result=solve(almost.final.result) } Rprof("profiling.out", line.profiling=TRUE) line_RProf_function() Rprof()

Speed up R code: Rprof > summaryRprof("profiling.out", lines="show") $by.self self.time self.pct total.time total.pct #8 0.72 30.00 0.72 30.00 #14 0.68 28.33 0.68 28.33 #2 0.68 28.33 0.68 28.33 #12 0.14 5.83 0.14 5.83 #11 0.12 5.00 0.12 5.00 #9 0.04 1.67 0.04 1.67 <no location> 0.02 0.83 0.02 0.83

How to Speed up R code HDD/SSD/RAM/CPU/Vectorization Avoid/minimize reading from/writing to disk! To measure execution time: time.start=proc.time()[[3]] ### Your code ### time.end=proc.time()[[3]] time.end-time.start system.time({### Your code ###})[[3]] > system.time({data=rnorm(10^7)})[] user.selfsys.self elapsed user.childsys.child 0.702 0.030 0.733 0.000 0.000 Writing to disk: > system.time({data=rnorm(10^7) + write.table(data, "data.txt") })[] user.selfsys.self elapsed user.childsys.child 16.433 0.469 16.951 0.000 0.000

Speed up R code:vectors Measuring time > ptm <- proc.time() > for (i in 1:50) mad(stats::runif(500)) > proc.time() - ptm user system elapsed 0.007 0.000 19.832 > system.time(for (i in 1:50) mad(stats::runif(500))) user system elapsed 0.006 0.001 0.006 > n=100 > a=NULL > system.time(for (i in 1:n) a=c(a, sqrt(i))) user system elapsed 0.013 0.002 0.015 > system.time(for (i in 1:n) a=c(a,stats::runif(500))) user system elapsed 0.043 0.005 0.049

Speed up R code:vectors n=1000 a=NULL # vector grows one by one system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 0.845 0.000 0.845 > n=1000 n=1000 a=vector(length=n) # vector declared, concatenation system.time(for (i in 1:n) {print(i); a=c(a,stats::runif(500))}) user system elapsed 0.820 0.078 0.901 system.time(for (i in 1:n) {a[i]=stats::runif(500)}) # assignment user system elapsed 0.1 0.0 0.1

Speed up R code:vectors n=1000 a=NULL system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 1.402 0.016 1.418 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 5.831 0.510 6.341 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 8.528 1.437 9.962 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 9.961 6.251 16.208 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 11.122 8.038 19.156 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 12.188 9.717 21.902

Speed up R code:vectors n=1000000 a=vector(length=n) n=10000000 system.time(for(i in 1:n) a[i]=sqrt(i)*2) ^C Timing stopped at: 1170.446 283.232 1454.167 n=10000000 a=vector(length=n) system.time(for(i in 1:n) a[i]=sqrt(i)*2) user system elapsed 12.017 0.039 12.055 system.time({for(i in 1:n) a[i]=sqrt(i); a=a*2}) user system elapsed 11.826 0.078 11.913

Speed up R code:vectors m=100000 n=10000 a=matrix(nrow=m,ncol=n) rm(a) system.time({a=matrix(nrow=m,ncol=n)}) user system elapsed 9.322 1.397 10.716 system.time({vrow=vector(length=m)}) system.time({vcol=vector(length=n)}) system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed # elementwise multiplication, row first iteration 2026.609 2.099 2028.409 > system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed 2274.735 0.038 2274.618 system.time(for(i in 1:m){a[i,]=vrow[i]*vcol}) # vector operation, row wise assignments user system elapsed 23.076 1.964 25.037 system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) # vector operation, column wise assignments user system elapsed 15.504 2.086 17.614 system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) user system elapsed 14.690 1.876 16.562 system.time(for(j in 1:n){for(i in 1:m){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed 1842.184 0.004 1841.948

Speed up R code: vectors, matrices, and tables vs. loops, dataframes, and lists Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({ a=NULL for(i in 1:10000)a=c(a,i^2)})[] user.selfsys.self elapsed user.childsys.child 0.118 0.036 0.155 0.000 0.000 system.time({ a=NULL for(i in 1:1000000)a=c(a,i^2)})[] user.selfsys.self elapsed user.childsys.child 1079.399 87.885 1167.065 0.000 0.000 system.time({a=vector(length=1000000) for(i in 1:1000000)a[i]=i^2})[] user.selfsys.self elapsed user.childsys.child 0.942 0.000 0.941 0.000 0.000

Speed up R code: vectors, matrices, and tables Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({a=vector(length=1000000) for(i in 1:n) a[i]=sin(i)*2*pi})[] user.selfsys.self elapsed user.childsys.child 1.192 0.000 1.193 0.000 0.000 system.time({twopi=2*pi a=vector(length=1000000) for(i in 1:n) a[i]=sin(i)*twopi})[] user.selfsys.self elapsed user.childsys.child 1.107 0.001 1.108 0.000 0.000 system.time({a=vector(length=1000000) for(i in 1000000)a[i]=sin(i) a=2*pi*a})[] user.selfsys.self elapsed user.childsys.child 0.002 0.002 0.004 0.000 0.000

How to Speed up R code Working with vectors system.time({n=100000; y=rnorm(n); w=vector(length=n); for(i in 1:n)w[i]=sum(y[1:i]<y[i])})[] user.selfsys.self elapsed user.childsys.child 51.176 5.776 56.937 0.000 0.000 system.time({n=100000; y=rnorm(n); w=matrix(1:n,nrow=n,ncol=1) w=apply(w,1,function(x) sum(y[1:x]<y[x]))})[] user.selfsys.self elapsed user.childsys.child 57.531 0.824 58.342 0.000 0.000

Speed up R: c functions with R ksmooth1 <- function(data, xpts, h) { dens <- double(length(xpts)) n <- length(data) for(i in 1:length(xpts)) { ksum <- 0 for(j in 1:length(data)) { d <- xpts[i] - data[j] ksum <- ksum + dnorm(d / h)} dens[i] <- ksum / (n * h)} return(dens)} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({fx_estimated=ksmooth1(data, xpts, h)})[[3]] [1] 8.579

Speed up R: c functions with R #include <R.h> #include <Rmath.h> void kernel_smooth(double *data, int *n, double *xpts, int *nxpts, double *h, double *result){ inti, j; double d, ksum; for(i=0; i < *nxpts; i++){ ksum - 0; for (j=0; j < *n; j++){ d = xpts[i] - data[j]; ksum += dnorm(d / *h, 0, 1, 0); } result[i] = ksum / ((*n) * (*h)); } } R CMD SHLIB ksmooth1.c icc -std=gnu99 -I/share/sw/free/R/3.2.2/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -O2 -march=native -c ksmooth1.c -o ksmooth1.o icc -std=gnu99 -shared -L/share/sw/free/R/3.2.2/lib64/R/lib -L/usr/local/lib64 -o ksmooth1.so ksmooth1.o -L/share/sw/free/R/3.2.2/lib64/R/lib –lR ls ksmooth1.* ksmooth1.c ksmooth1.o ksmooth1.so

Speed up R: c functions with R dyn.load("ksmooth1.so") ksmooth2 <- function(data, xpts, h) { n <- length(data) nxpts <- length(xpts) dens <- .C("kernel_smooth", as.double(data), as.integer(n), as.double(xpts), as.integer(nxpts), as.double(h), result = double(length(xpts))) return(dens[[6]])} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({result=ksmooth2(data, xpts, h)})[[3]] [1] 0.187

Speed up R: MP with R $ wget http://homepage.stat.uiowa.edu/~luke/R/experimental/pnmath_0.0-4.tar.gz > .libPaths() [1] "/home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2" [2] "/share/sw/free/R/3.2.2/lib64/R/library" >install.packages("/scratch/users/zyzhang/usertests/R/Tutorial/pnmath/pnmath_0.0-4.tar.gz", repos=NULL) $ R CMD INSTALL pnmath_0.0-4.tar.gz/home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2 > library(pnmath) > ls(pos="package:pnmath") [1] "calibratePnmath" "getNumPnmathThreads" "getNumProcs" [4] "getParallelThresholds" "setNumPnmathThreads" > getNumProcs() [1] 16 > getNumPnmathThreads() [1] 16 > system.time({A=matrix(1:10^9,nrow=1000); B=tan(sin(cos(tan(A))))})[] > system.time({A=matrix(1:10^9,nrow=1000); B=tan(sin(cos(tan(A))))})[]

Large memory and out-of-memory data biglm incremental computations for lm() and glm() to data sets stored outside of R's main memory. ff file-based access to data sets that are too large to be loaded into memory biglars large-than-memory datasets for least-angle regression, lasso and stepwise regression. ffbase adds basic statistical functionality to the ff package. bigmemory: large objects in memory (as well as via files) referred to by external pointers transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also share big memory objects. database and database-alike packages HadoopStreaming writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop. speedglm fit (generalized) linear models to large data. bigrf Random Forests implementation for parallel execution and large memory. MonetDB.R allows R to access the MonetDB column-oriented, open source database system as a backend.

R in parallel: implicit parallelism pnmath Open MP directives internal R functions makeing use of multiple cores --- Pnmath0: Pthreads instead of OpenMp if newer compilers are not available. Similar functionality is expected to become integrated into R 'eventually'. romp: Open MP using Fortran pre-alpha and R-Forge project romp was initiated but there is no package, yet. R/parallel package by Vera, Jansen and Suppi: C++-based with master-slave dispatch mechanism Rdsm package: threads-like, multicore and distributed shared memory RhpcBLASctl: Detects and explicitly select the number of available BLAS cores Rhpc: *apply() style dispatch via MPI.

R in parallel: explicit parallelism Rmpi by Yu: access to numerous functions from the MPI API/R-specific extensions. LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI pbdMPI: MPI interface to support SPMD pbdSLAP: scalable linear algebra packages pbdBASE: core classes and methods for distributed data types pbdDMAT: distributed dense matrices for "Programming with Big Data". pbdNCDF4: multiple processes access to the same files supports terabyte-sized files. pbdDEMO: examples and detailed vignette. pbdPROF: profiles MPI SPMD code with fpmpi, mpiP, or TAU. nws (NetWorkSpaces) from REvolution Computing, implemented on top of the Twisted networking toolkit for Python

R in parallel: explicit parallelism snow (Simple Network of Workstations): communitarian abstraction for PVM, MPI, NWS and direct networking sockets snowFT fault-tolerance extensions to snow Snowfall: a more recent alternative to snow. Functions can be used in sequential or parallel mode. foreach: general iteration over elements in a collection No explicit loop counter. loop in parallel: doMC (parallel/multicore on single workstations), doSNOW, doMPI (using Rmpi) and doRedis. future: synchroneous (sequential) and asynchronous (parallel) evaluations via abstraction of futures, either via function calls or implicitly via promises. Global variables are automatically identified. Iteration over elements in a collection is supported. bigrf: Random Forests with parallel execution and large memory. Rborist: OpenMP predictor-level parallelism in the Random Forest algorithm efficient use of multicore hardware in restaging data and in determining splitting criteria, both of which are performance bottlenecks in the algorithm.

R in parallel: GPUs gputools: common data-mining algorithms implemented using a mixture of nVidia's CUDA langauge and cublas library. T rpud: optimized distance metric for NVidia-based GPUs. gcbd: benchmarking framework for BLAS and GPUs (using gputools) cudaBayesreg: rhierLinearModel from the bayesm package for high-performance statistical analysis of fMRI voxels. rgpu: bioinformatics analysis by using the GPU. OpenCL interface from R to OpenCL, hardware- and vendor neutral interfaces to GPU programming. WideLM package: use CUDA (4.1 or greater) to fit many 'skinny' regression models simultaneously from a single data set. HiPLARM: High-Performance Linear Algebra for R using multi-core and/or GPU support using the PLASMA / MAGMA libraries from UTK, CUDA, and accelerated BLAS. permGPU: permutation resampling inference for RNA microarray studies on the GPU gmatrix: matrix and vector operations with intermediate computations kept on the coprocessor and reused, to minimize data movement.

R in parallel: parallel backend Parallel computation depends upon a parallel backend that must be registered before performing the computation: doMC, doSNOW, doPrallel,doMPI Setting the parallel backend require(doMC) Loading required package: doMC Loading required package: foreach Loading required package: iterators Loading required package: parallel # registerDoMC() # Default is 2 cores # registerDoMC(32) # For example, register 32 cores registerDoMC(8) # use 8 cores # getDoParWorkers() # Check number of registered cores # registerDoSEQ() # For sequential, also necessary when # changing # of cores before registerDoMC() getDoParWorkers()

R in parallel: foreach R looping constructs: repeat, while apply, lapply, sapply, eapply, mapply, rapply Loop: for (vector counter) {Statements} foreach: parallel execution on multiple cores or nodes Returns an object (default as a list) > ls(pos="package:foreach") [1] "%:%""%do%" "%dopar%" [4] "accumulate" "foreach" "getDoParName" [7] "getDoParRegistered" "getDoParVersion" "getDoParWorkers" [10] "getDoSeqName" "getDoSeqRegistered" "getDoSeqVersion" [13] "getDoSeqWorkers" "getErrorIndex" "getErrorValue" [16] "getexports" "getResult" "makeAccum" [19] "registerDoSEQ" "setDoPar" "setDoSeq" [22] "times" "when"

R in parallel: foreach with combine option library(foreach) x <- foreach(a=1:1000, b=rep(10, 2)) %do% {a + b} m <- matrix(rnorm(9), 3, 3) foreach(i=1:ncol(m), .combine=c) %do% mean(m[,i]) d <- data.frame(x=1:10, y=rnorm(10)) s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d Identical(d, s) library(iterators) a <- matrix(1:16, 4, 4) b <- t(a) foreach(b=iter(b, by='col'), .combine=cbind) %dopar% (a %*% b) qsort <- function(x) { n <- length(x) if (n == 0) { x } else {p <- sample(n, 1) smaller <- foreach(y=x[-p], .combine=c) %:%when(y <= x[p]) %do% y larger <- foreach(y=x[-p], .combine=c) %:% when(y > x[p]) %do% y c(qsort(smaller), x[p], qsort(larger)) }} qsort(runif(12))

R in parallel: foreach and combine combine with cbind, +, * c function concatenate all the results cbind() function combines vector, matrix or data frame by columns rbind() function combines vector, matrix or data frame by rows .multicombine=TRUE: more than 2 arguments, .maxcombine=10 : sets maximum arguments, default 100 .inorder: the order in which the arguments are combined, default is TRUE cfun <- function(...) NULL x <- foreach(i=1:4, .combine='cfun', .multicombine=TRUE) %do% rnorm(4) foreach(i = 1:3) %do% sqrt(i) x <- foreach(i=1:3, .combine='c') %do% exp(i) x <- foreach(i=1:4, .combine='cbind') %do% rnorm(4) x

R in parallel: foreach and iterator iterators vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers icount > library(iterators) > x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a > x > set.seed(123) > x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a > x [1] 9.097676 -13.106472 14.076261 19.252750

R in parallel: foreach and parallel execution %do% changed to %dopar% vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers Icount: Returns an iterator that counts starting from one library(iterators) x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a x set.seed(123) x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a x [1] 9.097676 -13.106472 14.076261 19.252750

R in parallel: combine option combine x <- iris[which(iris[,5] != "setosa"), c(1,5)] trials <- 10000 ptime <- system.time({ r <- foreach(icount(trials), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1)}})[3] ptime elapsed 5.601

R in parallel: parallel > ls(pos="package:parallel") [1] "clusterApply" "clusterApplyLB" "clusterCall" [4] "clusterEvalQ" "clusterExport" "clusterMap" [7] "clusterSetRNGStream" "clusterSplit" "detectCores" [10] "makeCluster" "makeForkCluster" "makePSOCKcluster" [13] "mc.reset.stream" "mcaffinity" "mccollect" [16] "mclapply" "mcMap" "mcmapply" [19] "mcparallel" "nextRNGStream" "nextRNGSubStream" [22] "parApply" "parCapply" "parLapply" [25] "parLapplyLB" "parRapply" "parSapply" [28] "parSapplyLB" "pvec" "setDefaultCluster" [31] "splitIndices" "stopCluster"

R in parallel: parallel parallel, detectCores(), system.time(), lapply(), mclappy() require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() pause <- function(i) {Sys.sleep(i)} > system.time(lapply(1:10, pause(0.25))) user system elapsed 0.000 0.000 2.503 system.time(mclapply(1:10, pause(0.25), mc.cores = n.cores)) user system elapsed 0.010 0.041 0.280

R in parallel: mcparallel() mcparallel(): forks a R process that evaluates the expression mccollect(): collect results from one or more parallel processes require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() system.time({ a1 <- mcparallel(1:5) a2 <- mcparallel(1:10) a3 <- mcparallel(1:15) a4 <- mcparallel(1:20) res <- mccollect(list(a1,a2,a3,a4)) }) user system elapsed 0.010 0.019 0.008

R in parallel: parallel backend, doMC detectCores() [1] 32 getDoParRegistered() [1] FALSE getDoParWorkers() [1] 1 registerDoMC() getDoParRegistered() [1] TRUE getDoParWorkers() [1] 16 registerDoMC(32) getDoParWorkers() [1] 32

R in parallel: doMC library(doMC) #foreach, iterators, parallel options(cores = 4) registerDoMC() system.time({data=vector(length=1000); data=foreach(i=1:1000) %dopar% {sqrt(1/(sin(i))^2)-sum(rnorm(10^6))};data=unlist(data) })[] user.selfsys.self elapsed user.childsys.child 0.158 0.399 18.132 0.000 0.000 options(cores = 1) registerDoMC() system.time({data=vector(length=1000); data=foreach(i=1:1000) %dopar% {sqrt(1/(sin(i))^2)-sum(rnorm(10^6))};data=unlist(data) })[] user.selfsys.self elapsed user.childsys.child 67.008 2.885 69.877 0.000 0.000

R in parallel: doMC for randomForest x <- matrix(runif(500), 100) y <- gl(2, 50) library(randomForest) rf <- foreach(ntree=rep(250, 4), .combine=combine) %do% randomForest(x, y, ntree=ntree) rf library(doMC,quiet=TRUE) detectCores() registerDoMC() getDoParRegistered() getDoParWorkers() rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar% randomForest(x, y, ntree=ntree) rf

Harnessing the Power of High Performance Computing for R