310 likes | 393 Views
R objects. All R entities exist as objects They can all be operated on as data We will cover: Vectors Factors Lists Data frames Tables Indexing R packages and datasets. Vectors. Think of vectors as being equivalent to a single column of numbers in a spreadsheet
E N D
R objects • All R entities exist as objects • They can all be operated on as data • We will cover: • Vectors • Factors • Lists • Data frames • Tables • Indexing • R packages and datasets
Vectors • Think of vectors as being equivalent to a single column of numbers in a spreadsheet • You can create a vector using the c( ) function (concatenate) as follows: x <- c( ) • For example: x <- c(1,2,4,8) creates a column of the numbers 1,2,4,8
?seq() ?rep() Vectors Other ways of creating columns of numbers (vectors): • The seq function seq(1,10,1) = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 seq(1,4,0.5) = 1, 1.5, 2, 2.5, 3, 3.5, 4 • x:y 1:10 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 2 * 1:10 = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 • The rep function rep(2,4) = 2, 2, 2, 2
Indexing Referencing (indexing) specific ‘cells’ in a column: Example: if x is the vector 1, 2, 5 then x [1] = 1, x [2] = 2, x [3] = 5 and x [1:2] = 1, 2 first two listed items in x x [2:3] = 2, 5 2nd & 3rd listed items in x x [x>2] = 5 use of ‘>’ and ‘<‘ characters
Performing simple operations on vectors • In R, when you carry out simple operations (+ - * /) on vectors that have the same number of entries, R just performs the normal operations on the numbers in the vector, entry by entry • If the vectors don’t have the same number of entries, then R will cycle through the vector with the smaller number of entries
Performing simple operations on vectors Examples:
Performing simple operations on vectors Vectors (columns of numbers) can be assigned by putting together other vectors, for example:
Functions • R functions take arguments (information that you put into the function which goes between the brackets) and can perform a range of tasks • In the case of the ‘help’ function the task is to display information from the R documentation files • A comprehensive list of R functions can be obtained from the R reference manual under the help menu
Simple statistic functions R comes with some useful functions: sqrt ( ) square root mean ( ) arithmetic mean hist ( ) calculating & plotting histograms R also comes with pre-loaded datasets, which we’ll discuss later….
Basic statistic functions on vectors > X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5) > sum(X1) sum = 26.9 > mean(X1) mean = 3.842857 > median(X1) median = 4 > var(X1) variance = 8.762857 > sd(X1) standard deviation = 2.960212 > summary(X1) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.550 4.000 3.843 4.650 9.500 > quantile(X1) 0% 25% 50% 75% 100% 1.00 1.55 4.00 4.65 9.50
Mixing vectors and scalars • R has the very convenient feature of having operators that work with vectors • It is even possible to mix vectors and scalars • For example: > X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5) > X1 + 1 [1] 2.1 5.3 6.0 3.0 2.0 5.0 10.5 > X1 * 2 [1] 2.2 8.6 10.0 4.0 2.0 8.0 19.0
Vectors to record data > x = c(45,43,46,48,51,46,50,47,46,45) > length(x) [1] 10 > x = c(x,48,49,51,50,49) # append values to x > length(x) [1] 15 > x[16] = 41 # add to a specified index > length(x) [1] 16 > mean(x) [1] 47.1875 > x[17:20] = c(40,38,35,40) # add to many specified indices > length(x) [1] 20 > mean(x) [1] 45.4
Factors • A factor is a vector that encodes information about the group to which a particular observation belongs • Categorical data is often used to classify data into various levels or factors • To make a factor is easy, using the factor function
Factors – smoking survey example A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > x # print out values in x [1] "Yes" "No" "No" "Yes" "Yes" > factor(x) # print out value in factor(x) [1] Yes No No Yes Yes Levels: No Yes # notice levels are printed. Notice the difference in how R treats factors with this example
Factors – student height example Suppose the recorded height of South African and British students are as follows heights <- c(1.7,1.95,1.63,1.54,1.29) You make a new vector fac_heights, to record the nationality that each observation pertains to fac_heights <- factor(c(“GB”, “SA”, “GB”, “GB”, “SA”)) Useful when testing for differences between groups
Internally, the factor ‘gender’ is stored as 691 1’s, followed by 692 2’s. It has stored with it a table that looks like this: Factors – gender survey example Consider a survey that has data on 691 females and 692 males > gender<-c(rep("female",691), rep("male",692)) # create vector > gender <- factor(gender) # change vector to factor • Once stored as a factor, the space required for storage is reduced • Values “female” and “male” are the levels of the factor • > levels(gender)# assumes gender is a factor • [1] "female" "male"
Lists A set of objects (e.g. vectors) can be combined under a single name as a list (similar to a spreadsheet in Excel) Example: x <- c (1, 7, 8, 9, 10) y <- c (“red”, “yellow”, “blue”, “green”) example_list <- list (size = x, colour = y) Note: vectors can consist of characters (i.e. letters/words) instead of numbers, but never numbers AND characters
Data frames The function data.frame( ): • This is a special kind of list, in which the entries in a specific position in the elements of the list correspond to one another • Each element of the list has the same length • It is a rectangular table, with rows and columns
Data frames Example 1: • Simple data frames can be created • Enter the following information at the prompt line: h <- c (150, 170, 168, 179, 130) w <- c (65, 70, 72, 80, 51) patient_data <- data.frame (weight=w, height=h) • Type in patient_data to see what’s just been created…
Access of elements in data frames • Individual elements can be accessed using a pair of square brackets “[ ]” and by specifying their index, or name • Here are some ways to access a cell, row or column: patient_data$height accesses a column patient_data [ , i] accesses the ith column patient_data [ i, ] accesses the ith row patient_data$height [i] i is the cell position in height column patient_data [ i, j ] looking for the jth cell in the ith column
Data frames • More complex tables can be created • Data within each column must have the same type (e.g., number, text), but different columns may have different types – like a spreadsheet, as in the example:
Data frames Accessing specific cells, or data: Note: "$" is a shortcut; minus "-" sign means not.
Tables • We often view categorical data with tables • The table function allows us to look at tables • Its simplest usage is table(x) where x is a categorical variable
Tables Example: smoking survey A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > table(x) x No Yes 2 3 The table command simply adds up the frequency of each unique value of the data
R packages and datasets • View a list of R packages: library() • Access datasets with the data function data( ) provides a list of all the datasets data (Titanic) loads the Titanic dataset summary (Titanic) provides summary information about the Titanic dataset attributes(Titanic) provides more information Titanic dataset name will display the data • List all datasets in a package, e.g., data(package='stats')
Working through some examples • List preloaded datasets in R: data( ) • Display the “women” dataset : women Now let’s access specific data…… • Access data from each column: women$height or women[ ,1] women$weight or women[ ,2] • Access data from individual rows: women[1, ] or women[10,] etc. • Try it…….
Working through some examples Now that you can access sample data, let’s work with it: • Get the mean weight and height of the women in our example….. • Remember the help function: help(mean) • Also, R can show an example: example(mean)
Common useful functions print() # prints a single R object cat() # prints multiple objects, one after the other length() # number of elements in a vector, or of a list mean() median() range() unique() # gives the vector of distinct values sort() # sort elements into order order() # x[order(x)] orders elements of x rev() # reverse the order of vector elements