150 likes | 252 Views
Ender Ahmet Yurt 121402101. Parallel Programming with R. What is R?. R is a programming language. For statistical computing. Open source. Command Line or GUI. Linear, non-linear modeling, classical statistical tests, times series, clustering, classification. Developing R packages.
E N D
Ender Ahmet Yurt121402101 Parallel Programming with R
What is R? • R is a programming language. • For statistical computing. • Open source. • Command Line or GUI. • Linear, non-linear modeling, classical statistical tests, times series, clustering, classification. • Developing R packages. • Try R http://tryr.codeschool.com/
Some R > 5+5 [1] 10 > x <- c(1,2,3,4,5,6) > x*2 [1] 2 4 6 8 10 12 > y <- x > mean(y) [1] 3.5
R & Parallel Programming • R is single-thread. • CPU leverage without extra packages. • Plenty of them, • RMPI (R Message-Passing Interface) • NWS (Network Spaces) • SNOW (Simple Network of Workstations) • SPRINT (Simple Parallel R Interface) • FOREACH • MULTICORE • PARALLEL
RmpI • MPI Interface. • MPI is developed by C or C++. • RMPI uses low level MPI functions. • Provides non-level C or C++ users.
Some rmpI # Load the R MPI package if it is not already loaded. if (!is.loaded("mpi_initialize")) { library("Rmpi") } # Spawn as many slaves as possible mpi.spawn.Rslaves() # In case R exits unexpectedly, have it automatically clean up # resources taken up by Rmpi (slaves, memory, etc...) .Last <- function(){ if (is.loaded("mpi_initialize")){ if (mpi.comm.size(1) > 0){ print("Please use mpi.close.Rslaves() to close slaves.") mpi.close.Rslaves() } print("Please use mpi.quit() to quit R") .Call("mpi_finalize") } } # Tell all slaves to return a message identifying themselves mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size())) # Tell all slaves to close down, and exit the program mpi.close.Rslaves() mpi.quit()
Snow • Simple Network of Workstations. • High level interfaces for workstations to do parallel computing. • Based Master-Slave model • Uses three based interface to create a virtual connection • Socket • PVM (Parallel Virtual Machine) • MPI (Message Passing Interface)
Some snow > library(snow) > library(rlecuyer) # Now set up some sample data. Here I take 100 random draws, with replacement, from the integers in \([0,5]\). > x <- sample(0:5, 100, replace = TRUE) > mean(x) [1] 2.64
Some snow (2) # Define a simple function to calculate a single bootstrapped mean from a given vector: > bs.mean <- function(v) { + s <- sample(v, length(v), replace = TRUE) + mean(s) + }
Some snow (3) # Now it’s time to set up the cluster. Here I set up a SOCK-type connection, which can be used to set up multiple R instances on the local machine and/or to set up R instances on remote machines through ssh connections. snow offers other connection options that may be more convenient or necessary depending on your environment (for instance, MPI was needed on the OSC cluster). > cl <- makeCluster(c("localhost", "localhost"), type = "SOCK")
Multicore • One machine multiple cores. • Faster than others. • Not work on MS Windows
Multicore (2) • mclapply - parallelized version of lapply. • lapply return a list same as its argument • A little example for lapply. > x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) # compute the list mean for each list element >lapply(x,mean) RUNNNN!!!! • EXAMPLE TIME!!!
PARAllel • Based on multicore and snow. • Solve Single Program – Multiple Data Problem. • Single Machines Multicore • Several Machines Snow • MPI supports RMPI package • EXAMPLE TIME!!!
THANK YOU