270 likes | 382 Views
Data & Graphing. vectors data frames importing data contingency tables barplots. 18 September 2014 Sherubtse Training. Data CLASSES in R. Vector: a single string of data Factor: categorical data, stored as category levels with frequencies Matrix: 2D table of data
E N D
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training
Data CLASSES in R • Vector: a single string of data • Factor: categorical data, stored as category levels with frequencies • Matrix: 2D table of data • Array: >2D table of data • Data Frame: 2D table that can accept different data modes • List: General structure for organizing all project data memory used (object.size)
Data MODES in R • Character/String: letters and text in quotation marks • Numeric/Integer: numbers • Logical: TRUE, FALSE, T, F (must be capital letters, no quotes; converts to 0 & 1 for arithmetic)
Data Classes: Vectors VECTORAsingle string of data of the same “mode” Examples: Numeric or Integer Modex <- c(1, 0, -5, 10, 300)x <- c(2+2, 9-6, 5) x <- c(2.5, 3.9, 0.7, 4.0) numeric or integer mode (spaces are for easy reading) Examples: Logical Mode answer <- c(TRUE, FALSE, TRUE, TRUE)answer <- c(T, F, T, T) logical mode
Data Classes: Vectors VECTORAsingle string of data of the same “mode” Examples: Character Mode animals <- c(“dog”, ”cat”, ”bird”)string <- c(“a”, ”c”, ”d”, ”z”, ”p”) answer <- c(“T”, “F”, “T”, “T”) values <- c(“-9”, “0.2”, “1.4”) character mode (single quotes also okay)
Working with Vectors Use subscripts to refer to elements of a vector: > x <- c(1, 0, -5, 10, 300) x[vector_position] x[3] -5 x[c(1, 4, 5)] 1 10 300 x[1:4] 1 0-5 10 x[-2] 1 -5 10 300
Working with Vectors Edit the vector:> x <- c(1, 0, -5, 10, 300) Append (add) data to the end of the vector: 1 0 -5 10 300 400 500 700 x <- c(x, 400, 500, 700) # NOTE: Also try append() Change a single value in the vector: x[6] <- 90 1 0 -5 10 300 90 500 700 x[x>100]<-NA x[which(x>100)]<-NA # Also try replace() Replace values > 100 with NA: 1 0 -5 10 NA90 NA NA
Importing Data OPTION 1 Type data directly into R OPTION 2 Use job <- scan(what="character") to paste in the following data copied from an Excel column Import the ‘job’ column data (exclude column heading) from the ‘Work’ tab in Excel, and assign it the variable name ‘job’
How might we graph these data? Here's a hint... table(job)
For example, you can just create a vector with labels, then make a barplot of the vector, or put the vector directly in barplot: job.count<- c("farmer"=12, "government"=2, "laborer"=4, "teacher"=2)
Importing Data OPTION 3Export the data as a csv- or tab-delimited text file, then import the text file into R Import the ‘HtWt’ dataset(notice how the data are arranged in Excel)
Data Classes: Data Frames DATA FRAMESA data frame is similar to the data format used in SPSS...different columns can have different modes (numeric, character, factor, etc.)
Working with Data Frames There are many way to refer to the elements in data frames... but we will focus on just a few To access the height column HtWt$cmHtWt[“cm”]HtWt[4]
Working with Data Frames To access a rowHtWt[5,] To access an elementHtWt[5,4] HtWt[5,”cm”]
HtWt Data What kinds of interesting questions can we ask?What graphs would we make to answer them? • Is there a difference in height between UWICE & SFS personnel? Does it differ for males vs. females? • Is there a difference in weight between UWICE & SFS personnel? Does it differ for males vs. females? • Is there a relationship between height and weight for UWICE personnel? How about for SFS personnel? • Is there a relationship between height and weight for males? How about for females?
Bar Plots For comparing COUNTS, PROPORTIONS (%) or MEANS of data in different qualitative categories. Oftenwe make bar plots of summary data.
Working with Data Frames Use the table() function to create a contingency table of sample counts by INSTITUTE and SEX. Try it also using with() table(HtWt$institute,HtWt$sex)
Move the legend to the top center ADD AS AN ARGUMENT: args.legend=list (horiz=T, x="top")
Working with Data Frames Use the function subset() to create a new data frame called ‘UWICE’ that includes only UWICE data UWICE <- subset(HtWt,institute=="UWICE") Now subset the HtWt data to get a data frame with only 'SFS' data and only the 'INSTITUTE' and 'SEX' columns. Call this data frame 'SFS.sex' SFS.sex <- subset(HtWt,institute=="SFS",select=1:2)
Reshaping Data • Install & load the package reshape2 • Import the Livestock data and save it to a variable called farms • Use the function cast() to reformat the farms data to a matrix form for stacked barplots: m.farms<-acast(farms,town~livestock) 4) Make a stacked barplot from m.farms
Make this graph—note that the y-axis values should be from 0 to 60