510 likes | 743 Views
Introduction to R - Lecture 4: Looping. Andrew Jaffe 9/27/2010. Overview. Practice Review The ‘for’ loop Rationale Syntax Application Getting creative…. Practice overview. Compute the average dog weight, dog length, and dog food consumption for each dog type at baseline.
E N D
Introduction to R -Lecture 4: Looping Andrew Jaffe 9/27/2010
Overview • Practice Review • The ‘for’ loop • Rationale • Syntax • Application • Getting creative…
Practice overview • Compute the average dog weight, dog length, and dog food consumption for each dog type at baseline
Practice Overview mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "lab"]) mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "husky"]) mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "poodle"]) mean(dog_dat$dog_wt_mo1[dog_dat$dog_type == "retriever"]) mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "lab"]) mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "husky"]) mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "poodle"]) mean(dog_dat$dog_len_mo1[dog_dat$dog_type == "retriever"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "lab"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "husky"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "poodle"]) mean(dog_dat$dog_food_mo1[dog_dat$dog_type == "retriever"])
Overview • Practice Review • The ‘for’ loop • Rationale • Syntax • Application • Getting creative…
Loop Rationale • Download “lec4_data.rda” from the website under Lecture 4 data • Load it into R • Remember – load(filename) • Check your workspace with ls()
Loop Rationale • What are the dimensions of the dataset?
Loop Rationale • What are the dimensions of the dataset? > dim(dog_dat) [1] 482 39
Loop Rationale • What are the variable names?
Loop Rationale • What are the variable names? > names(dog_dat) [1] "dog_id" "owner_id" "dog_type" [4] "dog_wt_mo1" "dog_wt_mo2" "dog_wt_mo3" [7] "dog_wt_mo4" "dog_wt_mo5" "dog_wt_mo6" [10] "dog_wt_mo7" "dog_wt_mo8" "dog_wt_mo9" [13] "dog_wt_mo10" "dog_wt_mo11" "dog_wt_mo12" [16] "dog_len_mo1" "dog_len_mo2" "dog_len_mo3" [19] "dog_len_mo4" "dog_len_mo5" "dog_len_mo6" [22] "dog_len_mo7" "dog_len_mo8" "dog_len_mo9" [25] "dog_len_mo10" "dog_len_mo11" "dog_len_mo12" [28] "dog_food_mo1" "dog_food_mo2" "dog_food_mo3" [31] "dog_food_mo4" "dog_food_mo5" "dog_food_mo6" [34] "dog_food_mo7" "dog_food_mo8" "dog_food_mo9" [37] "dog_food_mo10" "dog_food_mo11" "dog_food_mo12"
Loop Rationale • dog_wt_mo1-12: the dog’s weight at each of the 12 months • dog_len_mo1-12: the dog’s length at each of the 12 months • dog_food_mo1-12: the dog’s food consumption at each of the 12 months
Loop Rationale • Now, compute the average dog weight, dog length, and dog food consumption for each dog type at EVERY visit • That would be 36*4 = 144 lines of code that’s almost identical • Now let’s talk about the ‘for’ loop….
Overview • Practice Review • The ‘for’ loop • Rationale • Syntax • Application • Getting creative…
Syntax for(i in 1:10) { print(i) } variable sequence Curly brackets designate a loop (or function). The body of the loop is between them.
Syntax > for(i in 1:10) { + print(i) + } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 Sets i=1, and run the loop body until the end Sets i=2, and run the loop body until the end Sets i=10, and run the loop body until the end
Syntax • Another way to think about it: set i=1, and then just run the loop body > i=1 > print(i) [1] 1 > i=2 > print(i) [1] 2
Syntax • Some notes/comments: • ‘i’ is a common variable for loops, but it can be anything: ‘x’, ‘names’, etc • That variable will get set to the loop sequence, and get overwritten if it exists • Run the ‘for’ loop above (with print), and type i – it should equal 10
Syntax > b = 0 > for(i in 1:10) { + b = b + i + print(b) + } [1] 1 [1] 3 [1] 6 [1] 10 [1] 15 [1] 21 [1] 28 [1] 36 [1] 45 [1] 55 > b = 0 > for(i in 1:10) { + b = b + i + } > b [1] 55 i=1: 0[b] + 1 = 1 = b i=2: 1[b] + 2 = 3 = b i=3: 3[b] + 3 = 6 = b i=4: 6[b] + 4 = 10 = b i=10: 45[b] + 10 = 55 = b
Syntax • We don’t just want to print stuff usually – we want to manipulate data and save it • Procedure: create a blank vector, then fill in that vector with a ‘for’ loop
Syntax • Guess what this is doing: b = 0 b_vec = rep(0, 10) for(i in 1:10) { b = b + i b_vec[i] = b } > > b_vec [1] 1 3 6 10 15 21 28 36 45 55 We’re using the looping variable to index!
Syntax • That last loop, step by step: • Set b=0 and create a blank vector of length 10 • For 1 through 10, add each iteration to its running sum • Ie sum(1:10) – 1 + 2 + … + 10 = 55 • Store that sum in vector b_vec
Overview • Practice Review • The ‘for’ loop • Rationale • Syntax • Application • Getting creative…
Application • Let’s take a step forward, and calculate the average dog weight, dog length, and dog food consumption for all dogs at every visit • Instead of looping over a vector, we will loop over a matrix/data.frame
Application • Let’s try just dog weight first • We can loop over non-sequential variables (indices) in a dataset • Here, we want columns 4-15 of dog_wt, which corresponds to the dog’s weights at each month
Application • Looping over non-sequential elements is easy to do • However, you have to be careful when saving outputs of non-sequential elements
Application > Index = c(1,3,5,7,9) > out = rep(NA,5) > mat = matrix(rnorm(100), ncol = 10) > for(i in Index) { + out[i] = mean(mat[,i]) + } > out [1] 0.2230609 NA -0.2862340 NA [5] 0.3940720 NA -0.1284383 NA [9] 0.1291539 Wrong – there’s missing data! Note that we want out[1:5] to correspond to mat[,c(1,3,5,7,9)]
Application Index = 4:15 mean_wt <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] # column index mean_wt[i] = mean(dog_dat[,ind]) }
Application • Here, we are defining our column indices first, and creating a blank vector of that length • We then loop over each value of that column index, take the mean of the resulting vector, and store it in the blank vector • This allows us to store the mean of the fourth column in the first position of our output vector, the fifth column in the 2nd position, 6th column in the 3rd position, etc
Application • So, the first time through the loop, we take the item from the i’th position of the Index • The first time through the loop, i=1, and ind = Index[1] = 4
Application • Why not just loop using for(i in 4:15)? Aka for(i in Index) > Index = 4:15 > mean_wt <- rep(0, length(Index)) > for(i in Index) { + mean_wt[i] = mean(dog_dat[,i]) + } > mean_wt [1] 0.00000 0.00000 0.00000 49.69606 48.56680 48.91141 [7] 50.13568 50.05124 49.54793 48.29378 46.41971 44.55975 [13] 45.02490 44.18506 45.75394 It’s too long – length(Index) = 12
Application • This is the same thing – if i = 4 the first time through, and you want something to be saved in position 1 of another vector: Index = 4:15 mean_wt <- rep(0, length(Index)) for(i in Index) { mean_wt[(i-3)] = mean(dog_dat[,i]) }
Application • I think it’s easier to define an index first, and then within the loop use each entry of that index (first way of doing it) • However, feel free to do it any way you want (however it makes the most sense to you)
Application • Note: R has several built-in commands that do what we just did: • rowSums() , colSums() • rowMeans(), colMeans() • We basically just did this using a loop: colMeans(dog_dat[,4:15])
Overview • Practice Review • The ‘for’ loop • Rationale • Syntax • Application • Getting creative…
Creative • We still have two problems to solve: • Average of food, weight, and length at each visit • And then those averages for each dog type at each visit
Creative Index = 16:27 mean_len <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_len[i] = mean(dog_dat[,ind]) } Index = 28:39 mean_food <- rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_food[i] = mean(dog_dat[,ind]) }
Creative > dog_means = rbind(mean_wt, mean_len, mean_food) > colnames(dog_means) = paste("month",1:12,sep="_") > dog_means month_1 month_2 month_3 month_4 month_5 mean_wt 49.69606 48.56680 48.91141 50.13568 50.05124 mean_len 20.32427 20.57220 20.68838 20.89668 20.98050 mean_food 30.01660 29.74834 28.75415 28.18942 29.50207 month_6 month_7 month_8 month_9 month_10 mean_wt 49.54793 48.29378 46.41971 44.55975 45.02490 mean_len 21.26950 21.37178 21.50705 21.61141 21.80975 mean_food 30.22573 30.88050 29.18942 30.01079 29.87033 month_11 month_12 mean_wt 44.18506 45.75394 mean_len 21.97842 22.27822 mean_food 29.51784 30.87614
Creative • paste: concatenates vectors after converting to character – its great for creating names within for loops, or of new matrices > paste("letter",c("a","b","c"), sep=":") [1] "letter:a" "letter:b" "letter:c" > x = c("a", "b", "c") > paste("letter",x, sep=":") [1] "letter:a" "letter:b" "letter:c"
Creative Index = 4:15 mean_wt <- rep(0, length(Index)) lab = rep(0, length(Index)) for(i in 1:length(Index)) { ind = Index[i] mean_wt[i] = mean(dog_dat[,ind]) lab[i] = paste("the ", i, "th entry is ", round(mean_wt[i],2),sep="") } > head(lab) [1] "the 1th entry is 49.7" "the 2th entry is 48.57" [3] "the 3th entry is 48.91" "the 4th entry is 50.14" [5] "the 5th entry is 50.05" "the 6th entry is 49.55"
Creative • Now we get to solve #2 (using ‘for’ loops and colMeans) and store it in 3 matrices • First, make blank matrices • Then create for loops over our variables of interest
Creative dogs = unique(dog_dat$dog_type) wt = matrix(nrow = length(dogs), ncol = 12) for(i in 1:length(dogs)) { # 1:4 # for each dog type... Index = which(dog_dat$dog_type == dogs[i]) # specific weights for each dog type tmp = dog_dat[Index,4:15] # each row is for one dog wt[i,] = colMeans(tmp) }
Creative > rownames(wt) = dogs > colnames(wt) = paste("month",1:12,sep="_") > wt month_1 month_2 month_3 month_4 month_5 month_6 lab 49.81840 48.69200 49.03360 50.26560 50.17600 49.67280 poodle 49.40090 48.27297 48.61892 49.84414 49.76126 49.25856 husky 49.26372 48.13097 48.48142 49.70088 49.61858 49.11327 retriever 50.19474 49.06466 49.40602 50.62632 50.54361 50.04135 month_7 month_8 month_9 month_10 month_11 month_12 lab 48.41600 46.54640 44.68640 45.15040 44.30640 45.88240 poodle 47.99820 46.12613 44.26577 44.73243 43.89009 45.46306 husky 47.86195 45.98761 44.12832 44.59469 43.75221 45.31858 retriever 48.79248 46.91278 45.05263 45.51654 44.68496 46.24586
Creative # same thing for length... len = matrix(nrow = length(dogs), ncol = 12) rownames(len) = dogs colnames(len) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) { tmp = dog_dat[dog_dat$dog_type == dogs[i],16:27] len[i,] = colMeans(tmp) } # and for food. food = matrix(nrow = length(dogs), ncol = 12) rownames(food) = dogs colnames(food) = paste("month",1:12,sep="_") for(i in 1:length(dogs)) { tmp = dog_dat[dog_dat$dog_type == dogs[i],28:39] food[i,] = colMeans(tmp) }
Creative • Note that the code for each category (weight, length, and food) is still quite similar • Next week, double ‘for’ loops and lists