1.02k likes | 1.13k Views
Programming in R coding, debugging and optimizing Katia Oleinik koleinik@bu.edu Scientific Computing and Visualization Boston University October 12, 2012. http://www.bu.edu/tech/research/training/tutorials/list /. i f. Comparison operators: == equal != not equal > (<) greater (less)
E N D
Programming in Rcoding, debugging and optimizingKatia Oleinikkoleinik@bu.eduScientific Computing and VisualizationBoston UniversityOctober 12, 2012 http://www.bu.edu/tech/research/training/tutorials/list/
if Comparison operators: == equal != not equal > (<) greater (less) >= (<=) greater (less) or equal if(condition) { command(s) } else { command(s) } Logical operators: & and | or ! not
if • ># define x • > x <- 7 • ># simple ifstatement • >if (x < 0)print("Negative") • ># simple if-else statement • >if ( x < 0 )print("Negative") elseprint("Non-negative") • [1] "Non-negative" • >#if statement may be used inside other constructions • >y <- if ( x < 0 )-1else0 • > y • [1] 0
if ># multiline if - else statement >if (x < 0 ) { +x <- x+1 +print("Add one") + } else if ( x == 0 ) { +print("Zero") + } else { +print("Positive value") + } [1] positive Note:For multiline if-statements braces are necessary even for single statement bodies. The left and right braces must be on the same linewith else keyword (in interactive session).
ifelse ifelse(test_condition, true_value, false_value) • ># ifelsestatement • >y <- ifelse(x < 0, -1, 0 ) • ># nested ifelsestatement • >y <- ifelse (x < 0, -1, ifelse (x > 0, 1, 0) )
ifelse Best of all – ifelse statement operates on vectors! • ># ifelse statement on a vector • >digits <- 0 : 9 • >ifelse(digits > 4, 1, 0 ) • [1] 0 0 0 0 0 1 1 1 1 1
ifelse • Exercise: • define a random vector ranging from -10 to 10: • x<- as.integer( runif( 10, -10, 10 ) ) • create vector y, such that its elements equal to absolute values of x • Note: normally, you would use abs() function to achieve this result
switch switch(statement, list) • ># simple switchstatement • >x <- 3 • >switch(x, 2, 4, 6, 8) • [1] 6 • >switch(x, 2, 4 )# returns NULL since there are only 2 elements in the list
switch switch(statement, name1 = str1, name2 = str2, … ) • ># switch statement with named list • >day <- "Tue" • >switch(day, Sun = 0, Mon = 1, Tue = 2, Wed = 3, …) • [1] 2 • ># switch statement with a “default” value • >food <- "meet" • >switch(food, banana="fruit", carrot="veggie", "neither") • [1] "neither"
loops There are 3 statements that provide explicit looping: - repeat - for - while Built – in constructs to control the looping: - next - break Note: Use explicit loops only if it is absolutely necessary. R has other functions for implicit looping, which will run much faster: apply(), sapply(), tapply(), and lapply().
repeat repeat { } statement causes repeated evaluation of the body until break is requested. Be careful – infinite loop may occur! ># find the greatest odd divisor of an integer >x <- 84 >repeat{ + print(x) +if( x%%2 != 0) break +x <- x/2 +} [1] 84 [1] 42 [1] 21 >
for for(object in sequence) { command(s) } ># calculate N! - factorial >x <- 7 >y <- 1 >for( j in 2:x ){ + y <- y*j +} >y [1] 5040 >
for for(object in sequence) { command(s) if (…) next # return to the start of the loop if (…) break # exit from (innermost) loop }
while while(test_statement) { command(s) } ># find the largest odd divisor of a given number >x <- 84 >while (x %% 2 == 0){ + x <- x/2 +} >x [1] 21 >
loops • Exercise: • Using either loop statement print all the numbers from 0 to 30 divisible by 7. • Use %% - modular arithmetic operator to check divisibility.
function myFun <- function(ARG, OPT_ARGs ){ statement(s) } ARG:vector, matrix, list or a data frame OPT_ARGs:optional arguments Functions are a powerful R elements. They allows you to expand on existing functions by writing your own custom functions.
function myFun <- function(ARG, OPT_ARGs ){ statement(s) } Naming: Variable naming rules apply. Avoid usage of existing (built-in) functions Arguments: Argument list can be empty. Some (or all) of the arguments can have a default value ( arg1 = TRUE ) The argument ‘…’ can be used to allow one function to pass on argument settings to another function. Return value: The value returned by the function is the last value computed, but you can also use return() statement.
function ># simple function: calculate (x+1)2 >f1 <- function (x) { + x^2 + 2*x + 1 +} >f1(3) [1] 16 >
function ># function with default arguments: calculate (x+a)2 >f2 <- function (x, a=1) { + x^2 + 2*x*a + a^2 +} >f2(3) [1] 16 >f2(3,2) [1] 25 > ># arguments can be called using their names ( and out of order!!!) > f2( a = 2, x = 1) [1] 9
function ># Some optional arguments can be specified as ‘…’ to pass them to another function >f3 <- function (x, … ) { +plot (x, … ) +} > ># print all the words together in one sentence >f3 <- function ( … ) { +print(paste ( … ) ) +} > f3("Hello", " R! ") [1] "Hello R! "
function Local and global variables: All variables appearing inside a function are treated as local, except their initial value will be of that of the global (if such variable exists). ># define a function >f <- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u<-u+1 # this will not affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >f(u) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 2
function Local and global variables: If you want to access the global variable – you can use the super-assignment operator <<-. You should avoid doing this!!! ># define a function >f <- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u <<- u+1 # this WILL affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >f(u) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 3 >
function Call vector variables: Functions do not change their arguments. ># define a function >f <- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >y <- f(x) # call the function [1] 2 > >print(x) # print value of x [1] 3 >
function Call vector variables: If you want to change the value of the function’s argument, reassign the return value to the argument. ># define a function >f <- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >x <- f(x) # call the function [1] 2 > >print(x) # print value of x [1] 2 >
function Finding the source code: You can find the source code for any R function by printing its name without parentheses. ># get the source code of lm() function >lm function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() . . . z } <environment: namespace:stats> >
function Finding the source code: For generic functions there are many methods depending on the type of the argument. ># get the source code of mean() function >mean function (x, ...) UseMethod("mean") <environment: namespace:base> >
function Finding the source code: You can first explore different methods and then chose the one you need. ># get the source code of mean() function > methods("mean") [1] mean.Datemean.POSIXctmean.POSIXltmean.data.frame [5] mean.defaultmean.difftime > ># get source code > mean.default function (x, trim = 0, na.rm = FALSE, ...) { if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { . . . z } <environment: namespace:stats>
apply apply(OBJECT, MARGIN, FUNCTION, ARGs ) object:vector, matrix or a data frame margin:1 – rows, 2 – columns, c(1,2) – both function: function to apply args:possible arguments Description: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find median of each row >apply (x, 1, median) [1] 5.5 6.5 7.5 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find mean of each column >apply (x, 2, mean) [1] 2 5 8 11 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># create a new matrix with values 0 or 1 for even and odd elements of x >apply (x, c(1,2), function (x) x%%2) [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 0 1 0 1 [3,] 1 0 1 0 >
lapply llapply() function returns a list: lapply(X, FUN, ...) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >lapply (x, mean) $a [1] 5.5 $beta [1] 4.535125 $logic [1] 0.3333333 >
sapply lsapply() function returns a vector or a matrix: sapply(X, FUN, ... , simplify = TRUE, USE.NAMES = TRUE) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >sapply (x, mean) a beta logic 5.5000000 4.5351252 0.3333333 >
code sourcing source("file", … ) file:file with a source code to load (usually with extension .r ) echo: if TRUE, each expression is printed after parsing, before evaluation.
code sourcing Linux prompt katana:~ %emacsfoo_source.r Text editor # dummy function foo<- function(x){ x+1 } R session ># load foo.r source file > source ("foo_source.r") ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8
code sourcing ># load foo.r source file > source ("foo_source.r", echo = TRUE) > # dummy function > foo <- function(x){ + x+1; + } ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8
code sourcing Exercise: - write a function that computes a logarithm of inverse of a number log(1/x) - save it in the file with .r extension - load it into your workspace - execute it - try execute it with input vector ( 2, 1, 0, -1 ).
debugging R package includes debugging tools. cat() & print() – print out the values browser() – pause the code execution and “browse” the code debug(FUN) – execute function line by line undebug(FUN) – stop debugging the function
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y # check the values of local variables [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000
debugging <RET>Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. c or contContinue execution without single stepping. nExecute the next statement in the function. This works from the browser as well. whereShow the call stack. QHalt execution and jump to the top-level immediately. To view the value of a variable whose name matches one of these commands, use the print() function, e.g. print(n).
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000 Browse[1]> n debug: y <- log(y) Browse[2]> Warning message: In log(y) : NaNsproduced >
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + y<-log(y); + } > debug(inv_log)# debug mode > inv_log (x)# call function Called from: inv_log(x) debugging in: inv_log(x) debug: { y <- 1/x y <- log(y) } Browse[2]> . . . > undebug(inv_log)# exit debugging mode
timing Use system.time() functions to measure the time of execution. ># make a function >g <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + }
timing Use system.time() functions to measure the time of execution. ># make a function > g <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function, measuring the time of the execution >system.time( g(100000) ) user system elapsed 0.107 0.002 0.109
optimization How to speed up the code?
optimization • How to speed up the code? • Use vectors !
optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } >
optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function >system.time( g1(100000) ) user system elapsed 0.107 0.002 0.109 ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } ># execute the function >system.time(g2(x) ) user system elapsed 0.002 0.000 0.003
optimization • How to speed up the code? • Avoid dynamically expanding arrays