1 / 83

Harnessing the Power of High Performance Computing for R

Learn about shared memory vs. distributed memory in R, optimizing R packages, compiled code interfaces, large memory data, and parallelism. Explore commands, vectors, lists, sequences, and matrices in R with examples and tips.

gaylec
Download Presentation

Harnessing the Power of High Performance Computing for R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stanford Research Computing Center research-computing-support@stanford.edu Harnessing the Power of High Performance Computing for R Zhiyong Zhang zyzhang@stanford.edu

  2. Shared memory v.s. distributed memory

  3. R: HPC Matters Most R packages use a single core. Vectorization and other optimizations in R Interfaces for Compiled code Profiling tools: memory and CPU Large memory and out-of-memory data Parallelism for R Threads or shared memory: limited to one node implicit: minor changes (if at all) required Explicit: some assembly required MPI/Sockets or distributed memory: one or more nodes (explicit only) R with accelerator: Xeon Phi Coprocessors GPU accelerators

  4. R in parallel: R basics .Rprofile/.Renviron file.path(getwd(),".Rprofile") R_LIBS $R_HOME $R_HOME_DIR $R_LIBS_USER [zyzhang@sherlock-ln02 ~]$ R RHOME /share/sw/free/R/3.2.0/lib64/R [zyzhang@sherlock-ln02~]$ Rscript script_name.R [zyzhang@sherlock-ln02~]$ R CMD BATCH script_name.R [zyzhang@sherlock-ln02 ~]$ R > source(“script_name”) > install.packages(“pkg”) > remove.packages(“pkg”) > help(solve) > example(solve) > ?<command-name> or ??<command-name> > ls(pos = "package:Rmpi") > .packages() > path.package("Rmpi") > installed.packages() > .libPaths() > .Library

  5. Useful R commands and variables help(fctn) displays help on any function fctn, as in Python. To invoke complex arithmetic, add 0i to a number. For example, sqrt(-1) returns NaN, but sqrt(-1 + 0i) returns 0 + 1i. sessionInfo() prints the R version, OS, packages loaded, etc. ls() shows which objects are defined. rm(list=ls()) clears all defined objects. dev.new() opens a new plotting window. The function sort() does not change its argument. Distribution function prefixes d, p, q, r: stand for density (PDF), probability ( CDF), quantile (CDF-1), and random sample. dnorm is the density function of a normal random variable and rnorm generates a sample from a normal random variable. dunif and runif: corresponding functions for a uniform random variable.

  6. Useful R commands and variables Assignment e <- m*c^2. m*c^2 -> e. Must use = for default function arguments or named arguments. Not reserved but better treated as: c, q, s, t, C, D, F, I, and T. Comments Comments begin with # and continue to the end of the line, as in Python or Perl. Missing values and NaN NaN, “not a number.”: overflows, undefined operation such as dividing by zero. NA, “not applicable.” missing data or NULL value. NA is a legal value inside an R vector and vectors received by R functions may contain non-null components. R functions must handle NA values. is.nan returns TRUE for NaN. is.na return true for NA or NaN.

  7. Useful R commands and variables Vectors A vector in R is a container vector, an ordered collection of numbers. The length of a vector is the number of elements in the container. Operations are applied componentwise. R allows adding two vectors of different lengths: elements of the shorter summand are recycled as often as necessary to create a vector the length of the longer summand. Scalars are vectors of length 1 Vectors are created using the c function: p <- c(2,3,5,7) Vectors in R are indexed starting with 1 and matrices are in column-major order. Elements of a vector can be accessed using []. Vectors expand when assigning to indexes past the end of the vector. Negative indices x[-n] returns a copy of x with the nth element removed. Boolean values can also be used as indices Five kinds of subscripts in R. The type of a vector is the type of the elements it contains. All elements of a vector must have the same underlying type: logical, integer, double, complex, character, or raw.

  8. Useful R commands and variables Lists Lists are like vectors, except elements need not all have the same type: integer, string, or a vector of Boolean values. Lists are created using the list function. Elements can be access by position using [[]]. Named elements may be accessed either by position or by name. Named elements of lists act like C structs, except a dollar sign rather than a dot is used to access elements. a <- list(name="Joe", 4, foo=c(3,8,9)) Now a[[1]] and a$name both equal the string “Joe”. Access a non-existent element of a list, say a[[4]] above, generates an error. Assignment to a non-existent element of a list extends the list. If the index assigned to is more than one past the end of the list, intermediate elements are created and assigned NULL values. Assignment to non-existent named fields, such as saying a$baz = TRUE.

  9. Useful R commands and variables Sequences seq(a, b, n) creates a closed interval from a to b in steps of size n. seq(a, b, length=n): sequence with n points and step size (b-a)/(n-1). seq(1, 10, 3) returns the vector containing 1, 4, 7, and 10. The step size argument n defaults to 1 in both R and Python. a:b is an abbreviation for seq(a, b, 1). Matrices Matrices in R are vectors of multiple dimensions m <- array( c(1,2,3,4,5,6), dim=c(2,3) ) creates a matrix m. R fills matrices by column, the first row of m has elements 1, 3, and 5. To fill m by row, m <- array( c(1,2,3,4,5,6), dim=c(2,3), by.row = TRUE).

  10. Useful R commands and variables Type conversion functions as.xxx converts its argument to type xxx. as.integer(3.2) returns the integer 3 as.character(3.2) returns the string “3.2”. Boolean operators T or TRUE for true values F or FALSE for false values. & and | apply element-wise on vectors. && and || are often used in conditional statements && and || use lazy evaluation as in C: the operators will not evaluate their second argument if the return value is determined by the first argument.

  11. Useful R commands and variables Functions f <- function(a=10, b) { return (a+b) } The function function returns a function, which could be assigned to a. Function statement can be used to create an anonymous function (lambda expression). return is a function and its argument must be contained in parentheses. The use of return is optional; otherwise the value of the last line executed is returned. A default value need not be a static type but could be a function of other arguments. Arguments are passed by value. Variables by reference: (1) non-local variables variables in the calling routine’s scope, or (2) pass in all needed values and return a list as output. Scope: R uses lexical scoping while S-PLUS uses static scope. Since variables cannot be declared — they pop into existence on first assignment — it is not always easy to determine the scope of a variable. You cannot tell just by looking at the source code of a function whether a variable is local to that function.

  12. R: memory profiling http://adv-r.had.co.nz/memory.html https://cran.r-project.org/doc/manuals/R-exts.html#Profiling-R-code-for-memory-use Memory management: predict how much memory is needed and to make the most of the memory. write faster code because accidental copies are a major cause of slow code. basics of memory management: objects, functions, larger blocks of code. size of an object: base size of each object is 40B: metadata, two pointers to previous and next data, one pointer to attributes, length, and padding allocates vectors that are 8, 16, 32, 48, 64, or 128 bytes long Beyond 128 bytes, R ask for memory in multiples of 8 bytes. Shared objects: shared objects are stored once Global string pool: unique strings are stored in one place

  13. R: memory profiling #install.packages("ggplot2"); #install.packages("pryr") require(ggplot2); require(pryr) sizes <- sapply(0:50, function(n) object_size(seq_len(n))) plot(0:50, sizes, xlab = "Length", ylab = "Size (bytes)", type = "s") plot(0:50, sizes - 40, xlab = "Length", ylab = "Bytes excluding overhead", type = "n") abline(h = 0, col = "grey80") abline(h = c(8, 16, 32, 48, 64, 128), col = "grey80") abline(a = 0, b = 4, col = "grey90", lwd = 4) lines(sizes - 40, type = "s") x1 <- 1:1e6 y1 <- list(1:1e6, 1:1e6, 1:1e6) object_size(x1); object_size(y1) object_size("banana"); object_size(rep("banana", 10))

  14. R: memory profiling mem_used(); mem_change() mem_change(x <- 1:1e6); mem_change(rm(x)); mem_change(NULL) gc(): R does garbage collection automatically Memory leaks: formulas and enclosures f1 <- function() { x <- 1:1e6; 10} mem_change(x <- f1()) object_size(x) f2 <- function() { x <- 1:1e6; a ~ b} mem_change(y <- f2()) object_size(y) f3 <- function() { x <- 1:1e6; function() 10} mem_change(z <- f3()) object_size(z)

  15. R: memory profiling with lineprof require(devtools); require(lineprof); #devtools::install_github("hadley/lineprof") read_delim <- function(file, header = TRUE, sep = ",") { first <- scan(file, what = character(1), nlines = 1, sep = sep, quiet = TRUE) p <- length(first) all <- scan(file, what = as.list(rep("character", p)), sep = sep, skip = if (header) 1 else 0, quiet = TRUE) all[] <- lapply(all, type.convert, as.is = TRUE) if (header) { names(all) <- first } else { names(all) <- paste0("V", seq_along(all)) } as.data.frame(all)} library(ggplot2) write.csv(diamonds, "diamonds.csv", row.names = FALSE) source("read-delim.r") prof <- lineprof(read_delim("diamonds.csv")) #shine(prof) prof

  16. Speed up R code: Rprof system.time({n=10^7; data1=rnorm(n); data2=rnorm(10^7); for(i in 1:n-1){data1[i]=data1[i]+data1[i+1]; data1[i]=exp(data1[i]^2)}; data1=data1*data2; matrix.data1=matrix(data1,nrow=100,byrow=TRUE); matrix.data2=matrix(data2,nrow=100,byrow=TRUE); almost.final.result=matrix.data1%*%t(matrix.data2); final.result=solve(almost.final.result)})[] Rprof("profiling.out") n=10^7; data1=rnorm(n); data2=rnorm(10^7); for(i in 1:n-1){data1[i]=data1[i]+data1[i+1]; data1[i]=exp(data1[i]^2)}; data1=data1*data2; matrix.data1=matrix(data1,nrow=100,byrow=TRUE); matrix.data2=matrix(data2,nrow=100,byrow=TRUE); almost.final.result=matrix.data1%*%t(matrix.data2); final.result=solve(almost.final.result) Rprof()

  17. Speed up R code: Rprof > summaryRprof("profiling.out") $by.self self.time self.pct total.time total.pct "rnorm" 1.36 35.79 1.36 35.79 "%*%" 0.62 16.32 0.62 16.32 "exp" 0.56 14.74 0.56 14.74 "^" 0.40 10.53 0.40 10.53 "+" 0.38 10.00 0.38 10.00 "matrix" 0.28 7.37 0.28 7.37 ":" 0.06 1.58 0.06 1.58 "t.default" 0.06 1.58 0.06 1.58 "-" 0.04 1.05 0.04 1.05 "*" 0.04 1.05 0.04 1.05

  18. Speed up R code: Rprof line_RProf_function=function() { data1=rnorm(10^7) for(i in 1:(length(data)-1)) { data1[i]=data1[i]+data1[i+1] data1[i]=exp(data1[i]^2) } data2=rnorm(10^7) data1=data2*data1 matrix.data1=matrix(data1, nrow=100, byrow=TRUE) matrix.data2=matrix(data2, nrow=100, byrow=TRUE) almost.final.result=matrix.data1%*%t(matrix.data2) final.result=solve(almost.final.result) } Rprof("profiling.out", line.profiling=TRUE) line_RProf_function() Rprof()

  19. Speed up R code: Rprof > summaryRprof("profiling.out", lines="show") $by.self self.time self.pct total.time total.pct #8 0.72 30.00 0.72 30.00 #14 0.68 28.33 0.68 28.33 #2 0.68 28.33 0.68 28.33 #12 0.14 5.83 0.14 5.83 #11 0.12 5.00 0.12 5.00 #9 0.04 1.67 0.04 1.67 <no location> 0.02 0.83 0.02 0.83

  20. How to Speed up R code HDD/SSD/RAM/CPU/Vectorization Avoid/minimize reading from/writing to disk! To measure execution time: time.start=proc.time()[[3]] ### Your code ### time.end=proc.time()[[3]] time.end-time.start system.time({### Your code ###})[[3]] > system.time({data=rnorm(10^7)})[] user.selfsys.self elapsed user.childsys.child 0.702 0.030 0.733 0.000 0.000 Writing to disk: > system.time({data=rnorm(10^7) + write.table(data, "data.txt") })[] user.selfsys.self elapsed user.childsys.child 16.433 0.469 16.951 0.000 0.000

  21. Speed up R code:vectors Measuring time > ptm <- proc.time() > for (i in 1:50) mad(stats::runif(500)) > proc.time() - ptm user system elapsed 0.007 0.000 19.832 > system.time(for (i in 1:50) mad(stats::runif(500))) user system elapsed 0.006 0.001 0.006 > n=100 > a=NULL > system.time(for (i in 1:n) a=c(a, sqrt(i))) user system elapsed 0.013 0.002 0.015 > system.time(for (i in 1:n) a=c(a,stats::runif(500))) user system elapsed 0.043 0.005 0.049

  22. Speed up R code:vectors n=1000 a=NULL # vector grows one by one system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 0.845 0.000 0.845 > n=1000 n=1000 a=vector(length=n) # vector declared, concatenation system.time(for (i in 1:n) {print(i); a=c(a,stats::runif(500))}) user system elapsed 0.820 0.078 0.901 system.time(for (i in 1:n) {a[i]=stats::runif(500)}) # assignment user system elapsed 0.1 0.0 0.1

  23. Speed up R code:vectors n=1000 a=NULL system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 1.402 0.016 1.418 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 5.831 0.510 6.341 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 8.528 1.437 9.962 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 9.961 6.251 16.208 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 11.122 8.038 19.156 system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 12.188 9.717 21.902

  24. Speed up R code:vectors n=1000000 a=vector(length=n) n=10000000 system.time(for(i in 1:n) a[i]=sqrt(i)*2) ^C Timing stopped at: 1170.446 283.232 1454.167 n=10000000 a=vector(length=n) system.time(for(i in 1:n) a[i]=sqrt(i)*2) user system elapsed 12.017 0.039 12.055 system.time({for(i in 1:n) a[i]=sqrt(i); a=a*2}) user system elapsed 11.826 0.078 11.913

  25. Speed up R code:vectors m=100000 n=10000 a=matrix(nrow=m,ncol=n) rm(a) system.time({a=matrix(nrow=m,ncol=n)}) user system elapsed 9.322 1.397 10.716 system.time({vrow=vector(length=m)}) system.time({vcol=vector(length=n)}) system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed # elementwise multiplication, row first iteration 2026.609 2.099 2028.409 > system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed 2274.735 0.038 2274.618 system.time(for(i in 1:m){a[i,]=vrow[i]*vcol}) # vector operation, row wise assignments user system elapsed 23.076 1.964 25.037 system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) # vector operation, column wise assignments user system elapsed 15.504 2.086 17.614 system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) user system elapsed 14.690 1.876 16.562 system.time(for(j in 1:n){for(i in 1:m){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed 1842.184 0.004 1841.948

  26. Speed up R code: vectors, matrices, and tables vs. loops, dataframes, and lists Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({ a=NULL for(i in 1:10000)a=c(a,i^2)})[] user.selfsys.self elapsed user.childsys.child 0.118 0.036 0.155 0.000 0.000 system.time({ a=NULL for(i in 1:1000000)a=c(a,i^2)})[] user.selfsys.self elapsed user.childsys.child 1079.399 87.885 1167.065 0.000 0.000 system.time({a=vector(length=1000000) for(i in 1:1000000)a[i]=i^2})[] user.selfsys.self elapsed user.childsys.child 0.942 0.000 0.941 0.000 0.000

  27. Speed up R code: vectors, matrices, and tables Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({a=vector(length=1000000) for(i in 1:n) a[i]=sin(i)*2*pi})[] user.selfsys.self elapsed user.childsys.child 1.192 0.000 1.193 0.000 0.000 system.time({twopi=2*pi a=vector(length=1000000) for(i in 1:n) a[i]=sin(i)*twopi})[] user.selfsys.self elapsed user.childsys.child 1.107 0.001 1.108 0.000 0.000 system.time({a=vector(length=1000000) for(i in 1000000)a[i]=sin(i) a=2*pi*a})[] user.selfsys.self elapsed user.childsys.child 0.002 0.002 0.004 0.000 0.000

  28. How to Speed up R code Working with vectors system.time({n=100000; y=rnorm(n); w=vector(length=n); for(i in 1:n)w[i]=sum(y[1:i]<y[i])})[] user.selfsys.self elapsed user.childsys.child 51.176 5.776 56.937 0.000 0.000 system.time({n=100000; y=rnorm(n); w=matrix(1:n,nrow=n,ncol=1) w=apply(w,1,function(x) sum(y[1:x]<y[x]))})[] user.selfsys.self elapsed user.childsys.child 57.531 0.824 58.342 0.000 0.000

  29. Speed up R: c functions with R ksmooth1 <- function(data, xpts, h) { dens <- double(length(xpts)) n <- length(data) for(i in 1:length(xpts)) { ksum <- 0 for(j in 1:length(data)) { d <- xpts[i] - data[j] ksum <- ksum + dnorm(d / h)} dens[i] <- ksum / (n * h)} return(dens)} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({fx_estimated=ksmooth1(data, xpts, h)})[[3]] [1] 8.579

  30. Speed up R: c functions with R #include <R.h> #include <Rmath.h> void kernel_smooth(double *data, int *n, double *xpts, int *nxpts, double *h, double *result){ inti, j; double d, ksum; for(i=0; i < *nxpts; i++){ ksum - 0; for (j=0; j < *n; j++){ d = xpts[i] - data[j]; ksum += dnorm(d / *h, 0, 1, 0); } result[i] = ksum / ((*n) * (*h)); } } R CMD SHLIB ksmooth1.c icc -std=gnu99 -I/share/sw/free/R/3.2.2/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -O2 -march=native -c ksmooth1.c -o ksmooth1.o icc -std=gnu99 -shared -L/share/sw/free/R/3.2.2/lib64/R/lib -L/usr/local/lib64 -o ksmooth1.so ksmooth1.o -L/share/sw/free/R/3.2.2/lib64/R/lib –lR ls ksmooth1.* ksmooth1.c ksmooth1.o ksmooth1.so

  31. Speed up R: c functions with R dyn.load("ksmooth1.so") ksmooth2 <- function(data, xpts, h) { n <- length(data) nxpts <- length(xpts) dens <- .C("kernel_smooth", as.double(data), as.integer(n), as.double(xpts), as.integer(nxpts), as.double(h), result = double(length(xpts))) return(dens[[6]])} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({result=ksmooth2(data, xpts, h)})[[3]] [1] 0.187

  32. Speed up R: MP with R $ wget http://homepage.stat.uiowa.edu/~luke/R/experimental/pnmath_0.0-4.tar.gz > .libPaths() [1] "/home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2" [2] "/share/sw/free/R/3.2.2/lib64/R/library" >install.packages("/scratch/users/zyzhang/usertests/R/Tutorial/pnmath/pnmath_0.0-4.tar.gz", repos=NULL) $ R CMD INSTALL pnmath_0.0-4.tar.gz/home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2 > library(pnmath) > ls(pos="package:pnmath") [1] "calibratePnmath" "getNumPnmathThreads" "getNumProcs" [4] "getParallelThresholds" "setNumPnmathThreads" > getNumProcs() [1] 16 > getNumPnmathThreads() [1] 16 > system.time({A=matrix(1:10^9,nrow=1000); B=tan(sin(cos(tan(A))))})[] > system.time({A=matrix(1:10^9,nrow=1000); B=tan(sin(cos(tan(A))))})[]

  33. Large memory and out-of-memory data biglm incremental computations for lm() and glm() to data sets stored outside of R's main memory. ff file-based access to data sets that are too large to be loaded into memory biglars large-than-memory datasets for least-angle regression, lasso and stepwise regression. ffbase adds basic statistical functionality to the ff package. bigmemory: large objects in memory (as well as via files) referred to by external pointers transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also share big memory objects. database and database-alike packages HadoopStreaming writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop. speedglm fit (generalized) linear models to large data. bigrf Random Forests implementation for parallel execution and large memory. MonetDB.R allows R to access the MonetDB column-oriented, open source database system as a backend.

  34. R in parallel: implicit parallelism pnmath Open MP directives internal R functions makeing use of multiple cores --- Pnmath0: Pthreads instead of OpenMp if newer compilers are not available. Similar functionality is expected to become integrated into R 'eventually'. romp: Open MP using Fortran pre-alpha and R-Forge project romp was initiated but there is no package, yet. R/parallel package by Vera, Jansen and Suppi: C++-based with master-slave dispatch mechanism Rdsm package: threads-like, multicore and distributed shared memory RhpcBLASctl: Detects and explicitly select the number of available BLAS cores Rhpc: *apply() style dispatch via MPI.

  35. R in parallel: explicit parallelism Rmpi by Yu: access to numerous functions from the MPI API/R-specific extensions. LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI pbdMPI: MPI interface to support SPMD pbdSLAP: scalable linear algebra packages pbdBASE: core classes and methods for distributed data types pbdDMAT: distributed dense matrices for "Programming with Big Data". pbdNCDF4: multiple processes access to the same files supports terabyte-sized files. pbdDEMO: examples and detailed vignette. pbdPROF: profiles MPI SPMD code with fpmpi, mpiP, or TAU. nws (NetWorkSpaces) from REvolution Computing, implemented on top of the Twisted networking toolkit for Python

  36. R in parallel: explicit parallelism snow (Simple Network of Workstations): communitarian abstraction for PVM, MPI, NWS and direct networking sockets snowFT fault-tolerance extensions to snow Snowfall: a more recent alternative to snow. Functions can be used in sequential or parallel mode. foreach: general iteration over elements in a collection No explicit loop counter. loop in parallel: doMC (parallel/multicore on single workstations), doSNOW, doMPI (using Rmpi) and doRedis. future: synchroneous (sequential) and asynchronous (parallel) evaluations via abstraction of futures, either via function calls or implicitly via promises. Global variables are automatically identified. Iteration over elements in a collection is supported. bigrf: Random Forests with parallel execution and large memory. Rborist: OpenMP predictor-level parallelism in the Random Forest algorithm efficient use of multicore hardware in restaging data and in determining splitting criteria, both of which are performance bottlenecks in the algorithm.

  37. R in parallel: GPUs gputools: common data-mining algorithms implemented using a mixture of nVidia's CUDA langauge and cublas library. T rpud: optimized distance metric for NVidia-based GPUs. gcbd: benchmarking framework for BLAS and GPUs (using gputools) cudaBayesreg: rhierLinearModel from the bayesm package for high-performance statistical analysis of fMRI voxels. rgpu: bioinformatics analysis by using the GPU. OpenCL interface from R to OpenCL, hardware- and vendor neutral interfaces to GPU programming. WideLM package: use CUDA (4.1 or greater) to fit many 'skinny' regression models simultaneously from a single data set. HiPLARM: High-Performance Linear Algebra for R using multi-core and/or GPU support using the PLASMA / MAGMA libraries from UTK, CUDA, and accelerated BLAS. permGPU: permutation resampling inference for RNA microarray studies on the GPU gmatrix: matrix and vector operations with intermediate computations kept on the coprocessor and reused, to minimize data movement.

  38. R in parallel: parallel backend Parallel computation depends upon a parallel backend that must be registered before performing the computation: doMC, doSNOW, doPrallel,doMPI Setting the parallel backend require(doMC) Loading required package: doMC Loading required package: foreach Loading required package: iterators Loading required package: parallel # registerDoMC() # Default is 2 cores # registerDoMC(32) # For example, register 32 cores registerDoMC(8) # use 8 cores # getDoParWorkers() # Check number of registered cores # registerDoSEQ() # For sequential, also necessary when # changing # of cores before registerDoMC() getDoParWorkers()

  39. R in parallel: foreach R looping constructs: repeat, while apply, lapply, sapply, eapply, mapply, rapply Loop: for (vector counter) {Statements} foreach: parallel execution on multiple cores or nodes Returns an object (default as a list) > ls(pos="package:foreach") [1] "%:%""%do%" "%dopar%" [4] "accumulate" "foreach" "getDoParName" [7] "getDoParRegistered" "getDoParVersion" "getDoParWorkers" [10] "getDoSeqName" "getDoSeqRegistered" "getDoSeqVersion" [13] "getDoSeqWorkers" "getErrorIndex" "getErrorValue" [16] "getexports" "getResult" "makeAccum" [19] "registerDoSEQ" "setDoPar" "setDoSeq" [22] "times" "when"

  40. R in parallel: foreach with combine option library(foreach) x <- foreach(a=1:1000, b=rep(10, 2)) %do% {a + b} m <- matrix(rnorm(9), 3, 3) foreach(i=1:ncol(m), .combine=c) %do% mean(m[,i]) d <- data.frame(x=1:10, y=rnorm(10)) s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d Identical(d, s) library(iterators) a <- matrix(1:16, 4, 4) b <- t(a) foreach(b=iter(b, by='col'), .combine=cbind) %dopar% (a %*% b) qsort <- function(x) { n <- length(x) if (n == 0) { x } else {p <- sample(n, 1) smaller <- foreach(y=x[-p], .combine=c) %:%when(y <= x[p]) %do% y larger <- foreach(y=x[-p], .combine=c) %:% when(y > x[p]) %do% y c(qsort(smaller), x[p], qsort(larger)) }} qsort(runif(12))

  41. R in parallel: foreach and combine combine with cbind, +, * c function concatenate all the results cbind() function combines vector, matrix or data frame by columns rbind() function combines vector, matrix or data frame by rows .multicombine=TRUE: more than 2 arguments, .maxcombine=10 : sets maximum arguments, default 100 .inorder: the order in which the arguments are combined, default is TRUE cfun <- function(...) NULL x <- foreach(i=1:4, .combine='cfun', .multicombine=TRUE) %do% rnorm(4) foreach(i = 1:3) %do% sqrt(i) x <- foreach(i=1:3, .combine='c') %do% exp(i) x <- foreach(i=1:4, .combine='cbind') %do% rnorm(4) x

  42. R in parallel: foreach and iterator iterators vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers icount > library(iterators) > x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a > x > set.seed(123) > x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a > x [1] 9.097676 -13.106472 14.076261 19.252750

  43. R in parallel: foreach and parallel execution %do% changed to %dopar% vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers Icount: Returns an iterator that counts starting from one library(iterators) x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a x set.seed(123) x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a x [1] 9.097676 -13.106472 14.076261 19.252750

  44. R in parallel: combine option combine x <- iris[which(iris[,5] != "setosa"), c(1,5)] trials <- 10000 ptime <- system.time({ r <- foreach(icount(trials), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1)}})[3] ptime elapsed 5.601

  45. R in parallel: parallel > ls(pos="package:parallel") [1] "clusterApply" "clusterApplyLB" "clusterCall" [4] "clusterEvalQ" "clusterExport" "clusterMap" [7] "clusterSetRNGStream" "clusterSplit" "detectCores" [10] "makeCluster" "makeForkCluster" "makePSOCKcluster" [13] "mc.reset.stream" "mcaffinity" "mccollect" [16] "mclapply" "mcMap" "mcmapply" [19] "mcparallel" "nextRNGStream" "nextRNGSubStream" [22] "parApply" "parCapply" "parLapply" [25] "parLapplyLB" "parRapply" "parSapply" [28] "parSapplyLB" "pvec" "setDefaultCluster" [31] "splitIndices" "stopCluster"

  46. R in parallel: parallel parallel, detectCores(), system.time(), lapply(), mclappy() require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() pause <- function(i) {Sys.sleep(i)} > system.time(lapply(1:10, pause(0.25))) user system elapsed 0.000 0.000 2.503 system.time(mclapply(1:10, pause(0.25), mc.cores = n.cores)) user system elapsed 0.010 0.041 0.280

  47. R in parallel: mcparallel() mcparallel(): forks a R process that evaluates the expression mccollect(): collect results from one or more parallel processes require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() system.time({ a1 <- mcparallel(1:5) a2 <- mcparallel(1:10) a3 <- mcparallel(1:15) a4 <- mcparallel(1:20) res <- mccollect(list(a1,a2,a3,a4)) }) user system elapsed 0.010 0.019 0.008

  48. R in parallel: parallel backend, doMC detectCores() [1] 32 getDoParRegistered() [1] FALSE getDoParWorkers() [1] 1 registerDoMC() getDoParRegistered() [1] TRUE getDoParWorkers() [1] 16 registerDoMC(32) getDoParWorkers() [1] 32

  49. R in parallel: doMC library(doMC) #foreach, iterators, parallel options(cores = 4) registerDoMC() system.time({data=vector(length=1000); data=foreach(i=1:1000) %dopar% {sqrt(1/(sin(i))^2)-sum(rnorm(10^6))};data=unlist(data) })[] user.selfsys.self elapsed user.childsys.child 0.158 0.399 18.132 0.000 0.000 options(cores = 1) registerDoMC() system.time({data=vector(length=1000); data=foreach(i=1:1000) %dopar% {sqrt(1/(sin(i))^2)-sum(rnorm(10^6))};data=unlist(data) })[] user.selfsys.self elapsed user.childsys.child 67.008 2.885 69.877 0.000 0.000

  50. R in parallel: doMC for randomForest x <- matrix(runif(500), 100) y <- gl(2, 50) library(randomForest) rf <- foreach(ntree=rep(250, 4), .combine=combine) %do% randomForest(x, y, ntree=ntree) rf library(doMC,quiet=TRUE) detectCores() registerDoMC() getDoParRegistered() getDoParWorkers() rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar% randomForest(x, y, ntree=ntree) rf

More Related