190 likes | 297 Views
Writing functions in R. Some handy advice for creating your own functions. A quick review of R. R is a statistical software package and an object-oriented programming language Terms to remember: Vectors, matrices, and dataframes Indices Functions. Warm up. Download the data for lab 3
E N D
Writing functions in R Some handy advice for creating your own functions
A quick review of R R is a statistical software package and an object-oriented programming language Terms to remember: Vectors, matrices, and dataframes Indices Functions
Warm up Download the data for lab 3 In Rstudio, go to Workspace → Import Dataset → From Text File Make sure to select the header option If you're not using Rstudio, the code is: data_lab_3 <- read.csv("~/documents/classes/Psych 1950/mood.csv") Where ~ is the path name
Warming up a little more Use the help() function to read about the read.csv() function How could we use it to read in a file with no header? read.csv(“filename”,header=FALSE) We can also use R to read in SPSS files, but for now we'll stick with read.csv()
Last page of warm-up (I promise!) Find the standard deviation (sd()) of the second column (puDay2call1) of your dataframe Uh-oh! That output isn't helpful Add the following argument to the standard deviation function: na.rm=TRUE
A slight modification Suppose that we want to calculate the standard deviation using the population formula Check the help file for sd(). Is there a way to do that? Nope! We'll need to make our own....
Making a function Let's start with something easier We'll make our own mean() function What should it do? We'll pass* it a vector of numbers as arguments* It should return* the mean *programming jargon
The function syntax getMean <- function(arguments){ commands go here } The name of the function is getMean() (this is usually a verb) The arguments are the values and instructions we give to the function The body is where the work happens
Iteration 1 getMean <- function(x){ return(sum(x)/length(x)) } Try this on the second column How can we handle NAs in the function, assuming we ALWAYS want to remove them?
Iteration 2 getMean <- function(x){ return(sum(x,na.rm=T)/length(x)) } Now try this one, and compare your results to R's built-in mean function Why aren't the values the same? Hint: what's the length of a vector that contains NAs?
Iteration 3 getMean <- function(x){ return(sum(x,na.rm=T)/length(na.omit(x)) } Another R function saves the day! Thanks, R! Compare your results to the built-in function
Another way to do it We've been leaning heavily on the sum() function Sometimes, though, we need to tell R to do a certain operation a number of times To do that, we use an operation called a for loop There are other loops as well, but we'll stick with a for loop
The anatomy of a for loop getFactorial <- function(number){ j=1 for (index in 1:number){ j <- j*index } return(j) } What will this function do?
One more concept Sometimes, we need a function to make a decision Here, we use conditionals if(condition){ #if the condition is true Something #do this } else{ #if it's false something else #do this instead }
For examples if (!is.na(x)){ #if x isn't an NA print(x) #write x. If it is, nothing } #will happen if (x<=4){ #if x is less than 4 print(x-1) } if (x==5){ #if x is exactly 5 print(“Five”) }
Looping to get the mean getMean_3 <- function(x){ sum <- 0 length <- 0 for (i in 1:length(x)){ if (!is.na(x[i])){ #exclude NAs sum <- sum+x[i] #keep a running tally of the sum length <- length+1 #and the length } } return(sum/length) #this is the mean }
Adding some complexity It's your turn now: Write two functions to compute the sum of squared deviations from the mean of a vector In one version, use the sum() function In the other, use a for loop Try to allow your function to work with a vector that includes some NAs
Remember The formula for the sum of squares of a set of numbers is the sum of (xi – mean(x))2 Now make R do it for you!
Last of all Make a new function that finds the (population) standard deviation of the vector Find the sum of squares, divide by the number of observations, and take the square root Test your function to make sure it's working