1 / 101

bu/tech/research/training/tutorials/list /

Programming in R coding, debugging and optimizing Katia Oleinik koleinik@bu.edu Scientific Computing and Visualization Boston University October 12, 2012. http://www.bu.edu/tech/research/training/tutorials/list /. i f. Comparison operators: == equal != not equal > (<) greater (less)

adora
Download Presentation

bu/tech/research/training/tutorials/list /

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming in Rcoding, debugging and optimizingKatia Oleinikkoleinik@bu.eduScientific Computing and VisualizationBoston UniversityOctober 12, 2012 http://www.bu.edu/tech/research/training/tutorials/list/

  2. if Comparison operators: == equal != not equal > (<) greater (less) >= (<=) greater (less) or equal if(condition) { command(s) } else { command(s) } Logical operators: & and | or ! not

  3. if • ># define x • > x <- 7 • ># simple ifstatement • >if (x < 0)print("Negative") • ># simple if-else statement • >if ( x < 0 )print("Negative") elseprint("Non-negative") • [1] "Non-negative" • >#if statement may be used inside other constructions • >y <- if ( x < 0 )-1else0 • > y • [1] 0

  4. if ># multiline if - else statement >if (x < 0 ) { +x <- x+1 +print("Add one") + } else if ( x == 0 ) { +print("Zero") + } else { +print("Positive value") + } [1] positive Note:For multiline if-statements braces are necessary even for single statement bodies. The left and right braces must be on the same linewith else keyword (in interactive session).

  5. ifelse ifelse(test_condition, true_value, false_value) • ># ifelsestatement • >y <- ifelse(x < 0, -1, 0 ) • ># nested ifelsestatement • >y <- ifelse (x < 0, -1, ifelse (x > 0, 1, 0) )

  6. ifelse Best of all – ifelse statement operates on vectors! • ># ifelse statement on a vector • >digits <- 0 : 9 • >ifelse(digits > 4, 1, 0 ) • [1] 0 0 0 0 0 1 1 1 1 1

  7. ifelse • Exercise: • define a random vector ranging from -10 to 10: • x<- as.integer( runif( 10, -10, 10 ) ) • create vector y, such that its elements equal to absolute values of x • Note: normally, you would use abs() function to achieve this result

  8. switch switch(statement, list) • ># simple switchstatement • >x <- 3 • >switch(x, 2, 4, 6, 8) • [1] 6 • >switch(x, 2, 4 )# returns NULL since there are only 2 elements in the list

  9. switch switch(statement, name1 = str1, name2 = str2, … ) • ># switch statement with named list • >day <- "Tue" • >switch(day, Sun = 0, Mon = 1, Tue = 2, Wed = 3, …) • [1] 2 • ># switch statement with a “default” value • >food <- "meet" • >switch(food, banana="fruit", carrot="veggie", "neither") • [1] "neither"

  10. loops There are 3 statements that provide explicit looping: - repeat - for - while Built – in constructs to control the looping: - next - break Note: Use explicit loops only if it is absolutely necessary. R has other functions for implicit looping, which will run much faster: apply(), sapply(), tapply(), and lapply().

  11. repeat repeat { } statement causes repeated evaluation of the body until break is requested. Be careful – infinite loop may occur! ># find the greatest odd divisor of an integer >x <- 84 >repeat{ + print(x) +if( x%%2 != 0) break +x <- x/2 +} [1] 84 [1] 42 [1] 21 >

  12. for for(object in sequence) { command(s) } ># calculate N! - factorial >x <- 7 >y <- 1 >for( j in 2:x ){ + y <- y*j +} >y [1] 5040 >

  13. for for(object in sequence) { command(s) if (…) next # return to the start of the loop if (…) break # exit from (innermost) loop }

  14. while while(test_statement) { command(s) } ># find the largest odd divisor of a given number >x <- 84 >while (x %% 2 == 0){ + x <- x/2 +} >x [1] 21 >

  15. loops • Exercise: • Using either loop statement print all the numbers from 0 to 30 divisible by 7. • Use %% - modular arithmetic operator to check divisibility.

  16. function myFun <- function(ARG, OPT_ARGs ){ statement(s) } ARG:vector, matrix, list or a data frame OPT_ARGs:optional arguments Functions are a powerful R elements. They allows you to expand on existing functions by writing your own custom functions.

  17. function myFun <- function(ARG, OPT_ARGs ){ statement(s) } Naming: Variable naming rules apply. Avoid usage of existing (built-in) functions Arguments: Argument list can be empty. Some (or all) of the arguments can have a default value ( arg1 = TRUE ) The argument ‘…’ can be used to allow one function to pass on argument settings to another function. Return value: The value returned by the function is the last value computed, but you can also use return() statement.

  18. function ># simple function: calculate (x+1)2 >f1 <- function (x) { + x^2 + 2*x + 1 +} >f1(3) [1] 16 >

  19. function ># function with default arguments: calculate (x+a)2 >f2 <- function (x, a=1) { + x^2 + 2*x*a + a^2 +} >f2(3) [1] 16 >f2(3,2) [1] 25 > ># arguments can be called using their names ( and out of order!!!) > f2( a = 2, x = 1) [1] 9

  20. function ># Some optional arguments can be specified as ‘…’ to pass them to another function >f3 <- function (x, … ) { +plot (x, … ) +} > ># print all the words together in one sentence >f3 <- function ( … ) { +print(paste ( … ) ) +} > f3("Hello", " R! ") [1] "Hello R! "

  21. function Local and global variables: All variables appearing inside a function are treated as local, except their initial value will be of that of the global (if such variable exists). ># define a function >f <- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u<-u+1 # this will not affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >f(u) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 2

  22. function Local and global variables: If you want to access the global variable – you can use the super-assignment operator <<-. You should avoid doing this!!! ># define a function >f <- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u <<- u+1 # this WILL affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >f(u) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 3 >

  23. function Call vector variables: Functions do not change their arguments. ># define a function >f <- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >y <- f(x) # call the function [1] 2 > >print(x) # print value of x [1] 3 >

  24. function Call vector variables: If you want to change the value of the function’s argument, reassign the return value to the argument. ># define a function >f <- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >x <- f(x) # call the function [1] 2 > >print(x) # print value of x [1] 2 >

  25. function Finding the source code: You can find the source code for any R function by printing its name without parentheses. ># get the source code of lm() function >lm function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() . . . z } <environment: namespace:stats> >

  26. function Finding the source code: For generic functions there are many methods depending on the type of the argument. ># get the source code of mean() function >mean function (x, ...) UseMethod("mean") <environment: namespace:base> >

  27. function Finding the source code: You can first explore different methods and then chose the one you need. ># get the source code of mean() function > methods("mean") [1] mean.Datemean.POSIXctmean.POSIXltmean.data.frame [5] mean.defaultmean.difftime > ># get source code > mean.default function (x, trim = 0, na.rm = FALSE, ...) { if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { . . . z } <environment: namespace:stats>

  28. apply apply(OBJECT, MARGIN, FUNCTION, ARGs ) object:vector, matrix or a data frame margin:1 – rows, 2 – columns, c(1,2) – both function: function to apply args:possible arguments Description: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix

  29. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 >

  30. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find median of each row >apply (x, 1, median) [1] 5.5 6.5 7.5 >

  31. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find mean of each column >apply (x, 2, mean) [1] 2 5 8 11 >

  32. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># create a new matrix with values 0 or 1 for even and odd elements of x >apply (x, c(1,2), function (x) x%%2) [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 0 1 0 1 [3,] 1 0 1 0 >

  33. lapply llapply() function returns a list: lapply(X, FUN, ...) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >lapply (x, mean) $a [1] 5.5 $beta [1] 4.535125 $logic [1] 0.3333333 >

  34. sapply lsapply() function returns a vector or a matrix: sapply(X, FUN, ... , simplify = TRUE, USE.NAMES = TRUE) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >sapply (x, mean) a beta logic 5.5000000 4.5351252 0.3333333 >

  35. code sourcing source("file", … ) file:file with a source code to load (usually with extension .r ) echo: if TRUE, each expression is printed after parsing, before evaluation.

  36. code sourcing Linux prompt katana:~ %emacsfoo_source.r Text editor # dummy function foo<- function(x){ x+1 } R session ># load foo.r source file > source ("foo_source.r") ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8

  37. code sourcing ># load foo.r source file > source ("foo_source.r", echo = TRUE) > # dummy function > foo <- function(x){ + x+1; + } ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8

  38. code sourcing Exercise: - write a function that computes a logarithm of inverse of a number log(1/x) - save it in the file with .r extension - load it into your workspace - execute it - try execute it with input vector ( 2, 1, 0, -1 ).

  39. debugging R package includes debugging tools. cat() & print() – print out the values browser() – pause the code execution and “browse” the code debug(FUN) – execute function line by line undebug(FUN) – stop debugging the function

  40. debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y # check the values of local variables [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000

  41. debugging <RET>Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. c or contContinue execution without single stepping. nExecute the next statement in the function. This works from the browser as well. whereShow the call stack. QHalt execution and jump to the top-level immediately. To view the value of a variable whose name matches one of these commands, use the print() function, e.g. print(n).

  42. debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000 Browse[1]> n debug: y <- log(y) Browse[2]> Warning message: In log(y) : NaNsproduced >

  43. debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x y <- log(y) } ># load foo.r source file > source("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + y<-log(y); + } > debug(inv_log)# debug mode > inv_log (x)# call function Called from: inv_log(x) debugging in: inv_log(x) debug: { y <- 1/x y <- log(y) } Browse[2]> . . . > undebug(inv_log)# exit debugging mode

  44. timing Use system.time() functions to measure the time of execution. ># make a function >g <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + }

  45. timing Use system.time() functions to measure the time of execution. ># make a function > g <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function, measuring the time of the execution >system.time( g(100000) ) user system elapsed 0.107 0.002 0.109

  46. optimization How to speed up the code?

  47. optimization • How to speed up the code? • Use vectors !

  48. optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } >

  49. optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function >system.time( g1(100000) ) user system elapsed 0.107 0.002 0.109 ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } ># execute the function >system.time(g2(x) ) user system elapsed 0.002 0.000 0.003

  50. optimization • How to speed up the code? • Avoid dynamically expanding arrays

More Related