380 likes | 566 Views
Statistical Software. An introduction to Statistics Using R. Instructed by Jinzhu Jia. Chap 1. R Basics. Installing R R Data Structures Vectors Matrices and Arrays Lists Data Frames Factors Objects. Installing R. R can be downloaded freely from http:// www.r-project.org .
E N D
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia
Chap 1. R Basics • Installing R • R Data Structures • Vectors • Matrices and Arrays • Lists • Data Frames • Factors • Objects
Installing R • R can be downloaded freely from http://www.r-project.org. • Windows, MAC, Linux versions
An Example • Through this example, we will learn what data structures R is using. • Data frames • Vectors • Factors • Lists • Matrices
Using R as a calculator Now you see a few functions: sin() exp() log() Try sin(pi/2) and Sin(pi/2), you will find that R is sensitive to the case of an alphabetical character We will talk more about functions later
Vectors • A vector is an ordered collection of elements of the same basic type. • Numeric vectors • Logical vectors • Character vectors
Numeric vectors • final_scores <- c(100,99,98) • ## create a vector • ## this is also an assignment statement • ## notice the differences between R and C • “final_scores” is the name of the created variable • “<-” is the assignment operator • 100,99,98 are the values of the elements of the created vector; they are concatenated with function c() • Type the variable name in R and hit enter, you will see this variable on screen
Numeric vectors -- variables • A variable is used to store information • The value can be alternated • Variable names use A-Z, or a-z, 0-9, period (.) and underscore (_) • Variable names cannot include spaces. • Variable names are case sensitive. • Variable names must start with a letter or a period. • Variable names cannot be one of the reserved keywords.
Vectors-- How long is a vector? • length() • A vector is an R object. • Each object has two intrinsic attributes: mode (or type) and length. • We can use mode() and length() to find these two attributes.
Vectors – Change length of a vector • Below is an equivalent way to create the above vector • final_scores2 <- numeric()## the length is 0 • final_scores2[1] = 100 ## thelengthis 1 • final_scores2[2] = 99 ## thelengthis 2 • final_scores2[3] = 98 ### note: `=' is also an assignment operator • Try the following operations: • X = 1:10 • length(X) = 3 • What is X? Differences between () and [] ??
Vectors – Index vectors • An index vector is used to select subsets of a vector. • Below are four types of index vectors • A logical vector • A vector of positive integers • A vector of negative integers • A vector of character strings
Logical index vectors • A logical vector must be the same length as the vector from which elements are to be selected • Values corresponds to TRUE in the index vector are selected • For example: find the scores that are greater than 85 • scores[scores >= 85] • Y<-X[!is.na(X)] • scores[gender == ‘Male’]
Positive or negative index vectors • A positive index vector can be any length. • It specifies which element should be included in the result • X[c(1,5,6,1,2,1)] • A negative index vector tells which element should be excluded. • X[-c(2,3)]
Index vectors with character strings • This index vector is used when a vector has a names attribute. • scores = c(90,85,93,78) • names(scores) = c('LiBai','LiHei', 'Li Hong', 'LiXiaolan') • scores[c('LiBai','Li Hong')]
Vectors – A useful example • Plot a unit circle – a circle centered at 0 with radius 1. • X= seq(from = -1,to = 1, by = 0.001) • Y = sqrt(1 - X^2) • Z = c(Y,-Y) • plot(rep(X,2),Z,type = 'l') • n = length(X) • X1 = c(X,X[n:1]) • Z1 = c(Y, -Y[n:1]) • plot(X1,Z1,type = 'l')
Vectors -- Help • Try the following commands • ? plot • ? seq • ? Rep • ?’(‘ • ?’[‘ • Google • baidu
Logical vectors • The elements of a logical vector can have value TRUE, FALSE, or NA (not available) • Logical operators: • >, <, ==, >=, <=, != • & (and) • | (or)
Character vectors • A sequence of characters delimited by the double quote character or single quote – no differences. • For example, c(‘Li Bai’, ‘Li Hong’, ‘Li Xiaolan’) is a character vector with 3 elements. • A useful function: paste() • paste(c(‘a’,’b’,’c’),c(‘1’,’2’,’3’)) • paste(c(‘X’,’Y’), 1:10, sep = “”) • See the differences??
A simple text mining example • text1 = "China's Jade Rabbit moon rover has endured a long lunar night but is still malfunctioning, state media said on Thursday, after technical problems last month cast uncertainty over the country's first moon landing.” • text2 = "Jade Rabbit, named after a lunar goddess in traditional Chinese mythology, landed to domestic fanfare in mid-December, on a mission to do geological surveys and hunt natural resources." • Question: (1)how many characters are there in Text1? (2)how many unique words are there in both Text1 and Text2? – google?
Factors • A factor is a vector…..Will learn more later • Just show one example: • tapply(final,gender,mean), here gender is a factor; this function returns an array, • The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here final, defined by the levels of the second component, here gender, as if they were separate vector structures.
A note on vector-recycling rule • Look at the following example: • X <- c(3,5,6) • Y<-1 • Z <- c(1,2,3,4,5,6) • X+Y = c(3,5,6) + c(1,1,1) • X+Z = c(3,5,6,3,5,6) + c(1,2,3,4,5,6) • In words, Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector.
Matrices and Arrays • Construction of a matrix X = matrix(,nrow = 2,ncol=2) X[1,1] = 2 X[2,2] = 3 X = matrix(1:9,ncol=3) X = matrix(1:9,ncol=3,byrow = T) as.vector(X) ## turn a matrix to a vector c(X) ## the same as as.vector(X)
Index matrices • Index matrices are used to extract information • Extract elements: X[1,3] • Extract a row: X[1,] • Extract a column: X[,2] • Extract a few rows and columns: X[c(1,2),c(3,3,2)]
Higher dimensional array • We take a 3 dimensional array as an example. • It can store matrices. • Say Z = array(dim=c(3,3,2)) Z[,,1] = X1; Z[,,2] = X2; ……
Operations on Matrices • Transpose: t(X) • dim(X), ncol(X),nrow(X) • Addition: X + Y • Subtraction: X-Y • Multiplication: NOT X*Y; X %*% Y • Inversion: solve(X) • diag(): investigate diag(X), diag(c(1,2,3)),diag(3)
Eigenvalues and SVD • Obj = eigen(X) ## eigenvalue decomposition • Obj2 = svd(X) ## singular value decomposition Each returns a list.
cbind() and rbind() • cbind() forms matrices by binding together matrices column-wise • rbind() forms matrices row-wise • Vectors are treated as matrices. • Recycling rule will be used for short vectors. • For example cbind(1,c(1,2),c(1,2,3))
More comments on factors • table() return frequency tables • Examples: • tabl=tapply(gender,gender,length) • tabl2 = table(gender) • Best_scores= cut(final,breaks = c(min(final)-0.5,85,max(final)+0.5)) • Tab3 = table(Best_scores,gender)
Lists • Recall that Vectors consists of an ordered collection of elements with the same basic type. • Matrices also contains elements with the same type (numeric) • A new type object called list consists of an ordered collection of any kinds of objects such as vectors, matrices, and lists……
Construction of a list • list(name1 = obj1, name2 = obj2) • It is very useful to use a list to return values of a function. • For example, obj = svd(X). This obj is a list; it contains singular values and singular vectors. • Lst <- list(name="Fred", wife="Mary", no.children=3, • child.ages=c(4,7,9)) • Lst[1],Lst[[1]]??
Modifying Lists • Lst$wife, Lst[[‘wife’]] • #both retrieve the value of components of the lists with name attributes `wife’ • You can also use Lst$w to denote Lst$wife if w can identify `wife”, ie. no other component name starts with `w’ • You can concatenate different lists with c() via • c(lsit1,list2,list3)
Data Frames • A data frame is a special list. • It is a list of vectors of the same length. • Data frame is a list with the components arranged like a matrix – each column is one component of the list. • Some Examples:
attach() and detach() • After using attach(DF), you can use each column of DF as a vector and the vector name is the column name • This way the original column in DF is protected. • After using detach(DF), all of the variable names after column names of DF will not be available.
Objects • The following are all R objects: • Vectors • Matrices and Arrays • Lists • Data Frames • Factors
References • http://www.r-tutor.com/r-introduction/ • cran.r-project.org/doc/manuals/R-intro.pdf • http://ua.edu.au/ccs/teaching/lsr