330 likes | 525 Views
Do. Learning. See & Hear. Read. Menu. Module 3 R Script Basics. PowerPoint must be in View Show Mode to See videos and hyperlinks. Module 3 R Script Basics Goals. Systematically Start you on your R learning curve Introduce essential functions Demonstrate working R scripts
E N D
Do Learning See & Hear Read Menu Module 3 R Script Basics PowerPoint must be in View Show Mode to See videos and hyperlinks R Script Basics
Module 3 R Script BasicsGoals • Systematically Start you on your R learning curve • Introduce essential functions • Demonstrate working R scripts • Have you run and edit R scripts through assignments • Provide building blocks for your own scripts R Script Basics
Essential Tasks and FunctionsCovered in This Module You can get complete R documentation On each function from R Console Help(“function”) or ?function For example >Help(“for”) >?read.table R Script Basics
R’s File Path & Name Conventions How to Read a Data File Working with R Scripts Working with Vectors & data.frames Working with Dates Subsetting & Factors Module 3 R Scripts Menu Press Hyperlinks to go to topic slide, Press Video Box to Start Video R Script Basics
Video 3-1: R’s File Path Click video image to start video R Script Basics
Get file path interactively R’s choose.files() function Brings up Select File Window Lets users interactively select data file Copy/paste the correctly formatted file name to your script R’s 2 Valid paths formats: “C:/Learn_R/Mod_3_R_Scripts” or “C:\\Learn_R\\Mod_3_R_Scripts” R lets you use / or \\, not \ Menu R’s File Path and Name Conventions Issue • Windows & R handle forward/& backward \ slashes differently • Windows path: • “C:\Learn_R\Mod_3_R_Scripts” • R considers \ as an escape character • Need to adjust to R’s path conventions R Script Basics
Start R Session In R Console, Open Script: “C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_1_choose_file.R” Save Script as: “C:/Learn_R_Mod_3_R_Script/Practice_3_1_choose.R Edit Script to Read data file: "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_1_GISS_1980_By_year.txt" Expected Result Assignment 3-1choose.files() R Script Basics
Video 3-2How to read a data file Click video image to start video R Script Basics
read.table() May be single most frequent function you will use Goal is to read data from source file, assign to data.frame Web Based Files- Simply specify “url” rather than path Link <- “http://….. Menu How to Read Data File (Txt, CSV, Web Based) #################### Example R Script: ############# ##Ex_Scr_3_2_read_file.R ############################Script to read data file, list contents## STEP 1: SETUP - Source File rm(list=ls()) link my_data <- read.table(link, skip =?, sep = "?", dec=".", row.names = NULL , header = ?, colClasses = c("??","??"), comment.char = "#", na.strings = c( "","*", "-",-99.9, -999.9 ), col.names = c( "?? ", "??") )my_data • Tip: • Use Notepad to look at data file • Print out first few lines of file • Use printout to answer ?? R Script Basics
Start R Session In R Console, Open Script: “C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_2_read_file.R” Save Script as: “C:/Learn_R_mod-3_R_Script/Practice_3_2_read.R Edit Script to Read data file: "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_1_GISS_1980_by_year.txt" Expected Result ## Practice_3_2_read_file.R ################### ##Script to read data file, list contents ## STEP 1: SETUP - Source File rm(list=ls()) link <- choose.files() my_data <- read.table(link, skip =0, sep = ",", dec=".", row.names = NULL , header =F, colClasses = c("numeric","numeric"), comment.char = "#", na.strings = c( "","*", "-",-99.9, -999.9 ), col.names = c( "yr", "anom") ) my_data Assignment 3-2Read Data File R Script Basics
Video 3-3: Working with R Scripts Click video image to start video R Script Basics
Let’s look at the simple, structured R script on the right Many of our R scripts will handle similar sets of tasks: Define source data file Read data & assign to data.frame Manipulate Data Produce Charts/ Reports Working with R Scripts ## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read file, produce plot## STEP 1: SETUP & SOURCE FILE rm(list=ls()); par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2")) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\n Description of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = "red", main = Title) • Things to Notice • Extensive comments (#s) • Delineation of Steps • Uses several arguments in read.table() and plot function() • Indentation of arguments • This script can be edited and reused for similar tasks R Script Basics
Menu Working with R Scripts ## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read 2 variable data file, produce XY plot## STEP 1: SETUP & SOURCE FILE rm(list=ls()); par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2")) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\n Description of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = "red", main = Title) R Script Basics
Menu Assignment 3-3 Edit R Script Source Data File "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_2_CO2_by_month.txt" • Go to Desktop • Press R shortcut • In R GUI, File > Open, Select c:\Learn_R\Mod_3_R_Script_Basics\Ex_Scr_3_3_work_w_scripts.R • Save as:C:\Learn_R\Mod_3_R_Script_Basics\Practice_3_3_work_w_script.R • Edit Practice_3_3_work_w_script.R file • Edit Comment at top • Edit col.names: c(“yr_frac”, “CO2”) • Edit Title Line 2:Monthly CO2 (ppmv) Mauna Loa,Hawaii" • Change col = “blue” • Save changes to Practice_3_3_… • Control A + Control R to run ## Ex_Scr_3_3_work_w_scripts.R ################### ##Script to read file, produce plot## STEP 1: SETUP & SOURCE FILE par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c(“Var1", “Var2”)) ## STEP 3: MANIPULATE DATA Title <- "Ex_scr_3_3.R Example Output\nDescription of Data Set" ## STEP 4: CREATE PLOT plot(Var2 ~ Var1, data = my_data, type = "l", col = “red", main = Title) R Script Basics
Menu Assignment 3-3Expected Result Practice_3_3_work_w_scripts.R script #################### Example R Script ############ Practice_3_3_work_w_scripts.R ############### ##Script to read file, produce plot## STEP 1: SETUP - Source File par(las=1) link <- choose.files()## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 0, row.names = NULL, header = T, colClasses = c("numeric", "numeric" ), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c("Yr_frac", "CO2")) ## STEP 3: MANIPULATE DATA Title <- “Practice_3_3_work_w_scripts.R Example Output\nMonthly CO2 (ppmv) Mauna Loa,Hawaii" ## STEP 4: CREATE PLOT plot(CO2 ~ Yr_frac, data = my_data, type = "l", col = "blue", main = Title) R Script Basics
Video 3-4 Vectors and data.frames Click video image to start video R Script Basics
Vector Data Types Numeric (2.67) Character (“John Smith”) Logical (“T”) Factor (“Male”) All items in vector must be same data type R will coerce all vector items to single type Vector Names data.frame[column number] data.frame$col.name data.frame & vector indexes [ ] df[c] - column number in data.frame df[r,c] - row & column in data.frame v[r] - row number in vector Calculated variables are vectors Vectors are dynamic Number of rows in data.frame must be the same for each vector nrow() function counts number of data rows in data.frame length() function counts number of items in vector Menu Vectors and data.frames– What you need to know R Script Basics
3 Ways to Enter vector items: Itemize var_type <- c(“character”, “numeric”, “numeric”, “logical”, “numeric”, “numeric”) or Combine c() & rep() var_type < c(“character”, rep(“numeric”, 2), “logical”, rep(“numeric”,2) or Combine c() & seq() x <- c(seq(1,10,2), 11,14,18,19) Menu Functions that Create Vectors c(), seq(), rep() In addition to read.table() function c() – “combine” Function my_animals <- c(“dog”, “cat”, rabbit”) my_num <- c(1,8,11.2, 13,6, 19.13) • seq() – “sequence” Function – uniformly spaced series • my_numbers <- seq(a,b, inc) • a – start value • b – end value • inc – increment; 1 is default • num <- seq(3,17,2) # (3,5,7,9,11,13,15,17) • uniform <- 5:9 #(5,6,7,8,9) • rep() – “replicate” Function • my_repeat_num <- rep(q, n) • q – number or character to be replicated • n – number of replications • my_rep<- rep(“abc”, 3) # (“abc”,“abc”,“abc”) R Script Basics
Menu How to Make Basic Vector Calculations: sum(), max(), min(), mean(), median() # If you may have missing values, use na.rm = T max(vector, na.rm = T) min(vector, na.rm = T) # must remove na's to get valid answer mean(vector, na.rm = T) median(vector, na.rm = T) sum(vector, na.rm = T) summary() prints quartiles, mean, min, max summary(data.frame) prints summary for each column quantiles(x, 0.9) finds 90th percentile rnorm(n1, m, d) generates n1 random numbers, mean m & sd - d Example > r<- rnorm(10,100,5) # creates vector with 10 random nos, mean =100, sd = 5 > r_mean <- mean(r) # calculate mean of vector r > r_mean # output r_men to console [1] 98.16317 R Script Basics
which( x = ??) Returns index for row(s) of vector x that meet criteria ?? # Find index of max() value vals <- c(1,3,2,68,11,13,19,8,49,4) my_max <- max(vals) which_val <- which(vals == my_max) cat(c("Max = ", my_max, "val #", which_val), fill = 30) Menu which() returns rows with specific value in vector R Script Basics
Menu attach() data.frame • For vectors in a data.frame must include data.frame name • data.frame$col.name or • data.frame[column number] or • attach(data.frame) function adds data.frame to R search path • Vectors in data.frame can be accessed by name • Saves having to use data.frame$ before vector name • detach(data.frame) good idea to remove from workspace when done R Script Basics
Menu Assignment 3-4Working with a vector • Start with New Script File • Save as: C:\Learn_R\Mod_3_R_Script_Basics\Practice_3_4_vectors.R • Create vals vector c(1,3,5,7,21,4,12.2,19.12,21) • Make these calculations summary(vals) length(vals) mean(vals) which(vals==max(vals)) Expected Result R Script Basics
Menu Video 3-5Working with Dates Click video image to start video R Script Basics
R,like Excel, treats dates in a special way!! R dates start Jan. 1, 1970 Before 1/1/70 negative After 1/1/70 positive Read dates as “character” vector Use as.Date() to convert to date vector Menu Working with DatesWhat You Need to Know my_date <- as.Date(char_v, “%m/%d/%y”) • Input dates must include d-m-year in any order • as. Date (char_v, “%m/%d/%y”) specifies how dates are organized • %d - day of month (1-31) • %m - month number (1-12) • %b - month abrev (Jan) • %B - full month name (January) • %y - 2 digit year (08) • %Y - 4 digit year (2008) Be sure to specify any delimiters in dates / , - * R Script Basics
##Script to Demonstrate character date input & conversion to R date ## STEP 1: SETUP - Source File link <- C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_3_GISS_by_month.txt” ## STEP 2: READ DATA my_data <- read.table(link, sep = ",", dec=".", skip = 1, row.names = NULL, header = F, colClasses = c("character", "numeric","factor"), na.strings = c("", "*", "-", -99.99,99.9, 999.9), col.names = c("char_date", "T_anom", "Enso_f")) ## STEP 3:Convert character dates to R dates, then get month valuesr_date <- as.Date(my_data$char_date, "%m/%d/%Y")r_mo <- months(r_date) ## STEP 4: New data.frame - add r_date & r_mo vectorsmy_data_1 <- data.frame(my_data, r_date, r_mo)attach(my_data_1) head(my_data_1) Data File Example Dates are character strings Menu Reading Date Character DataConverting to R Dates R Script Basics
Start R Session In R Console, Open Script: “C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_5_Date_conv.R” Run Script to Read data file: "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_3_GISS_by_month.txt“ Things to Notice Creation of new data.frame Use of attach() function Use of as.Date() Use of head() Menu Assignment 3-5Working with Dates Printout & Read R Documentation for as.Date() & months() ? as.Date ? months R Script Basics
Menu Video 3-6Functions to Subset & Summarize Data Click video image to start video R Script Basics
R lets you quickly define subsets of data and calculate summary statistics for the subset Goal: calculate average temperature anom for 1930s decade which_decade <- 1930 decade <- as.integer(my_data$yr/10)*10 my_data <- data.frame(my_data, decade) attach(my_data) decade_subset <- subset(my_data, decade== which_decade) decade_avg <- mean(decade_subset$anom) cbind(which_decade, decade_avg) Data File Example Menu subset() Subset Data and Calculate Summary Values • subset() function • dec_subset<- subset(df, vector = =?) • Approach: • Calculate decade for each row • Subset rows with decade = 1930 • Calculate average for subset R Script Basics
What if we want average for each decade? Data File for(i in a:b) { }How to use for loop & subset() ## STEP 3: CALC DECADE MEANS decade <- as.integer(my_data$yr/10)*10 my_data <- data.frame(my_data, decade) attach(my_data) dec_list <- seq(1880, 2000, 10) num_dec <- length(dec_list) dec_subset<- 1 dec_avg<- 1 for(i in 1:num_dec){ dec_subset <- subset(my_data, decade == dec_list[i]) dec_avg[i] <- mean(dec_subset$anom, na.rm=T) } cbind(dec_list, dec_avg) • Combine for loop & subset() • for (i in a:b) { subset(df, vector ==? )} • Approach: • Calculate decade for each row • Subset rows by decade • Calculate average for each decade subset R Script Basics
1. Create decade_f factor as.factor(decade) 2. Summarize by decade_f tapply( x, INDEX, FUN) Applies FUNction (mean, max, etc) to each cell in x for each level of factor INDEX Data File Menu tapply() How to Summarize Data by Factor Another way to get average for all decades? ## STEP 3: CALC DECADE MEANS decade <- as.integer(my_data$yr/10)*10 decade_f <- as.factor(decade) my_data <- data.frame(my_data, decade_f) attach(my_data) dec_avg <- tapply(anom, INDEX = decade_f, mean) cbind(dec_avg) R Script Basics
Start R Session In R Console, Open Script: “C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_6_subset_data_mean.R” Run Script to Read data file: "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_4_GISS_By_year.csv" Edit which_decade to 1940 & Rerun script Expected Result Menu Assignment 3-6subset() & mean() Printout & Read R Documentation for subset() & mean() ? subset ? mean R Script Basics
Start R Session In R Console, Open Script: “C:/Learn_R/Mod_3_R_Script/Ex_Scr_3_8_factor_tapply.R” Run Script to Read data file: "C:\\Learn_R\\Mod_3_R_Script_Basics\\Data_3_4_GISS_By_year.csv“ Things to Notice Creation of new data.frame Use of attach() function Use of as.factor() Use of tapply() Menu Assignment 3-7as.factor() & tapply() Printout & Read R Documentation for as.factor() & tapply() ? factor ? tapply R Script Basics