640 likes | 734 Views
Crash Course in R · October 16, 2009. Jack Chen. Presentation Flow. R Session: Function writing Plots customization Simulation tips. Background/ Environment. Read/Write Data. Common Data Structures and Operations. Object- o rientated Concept. Graphics Samples. Control Blocks.
E N D
Crash Course in R · October 16, 2009 Jack Chen
Presentation Flow • R Session: • Function writing Plots customization Simulation tips • Background/ Environment • Read/Write Data • Common Data Structures and Operations • Object- orientated Concept • Graphics Samples • Control Blocks Major Topics
“S” Fortran Interactive Environment Statistical Computing Subroutines Engine “Interactive Statistical Computing System” “Statistical Analysis System” Mid 1970s, Bell Laboratory John Chambers, Rick Becker Background History
Fortran C++ C Interactive Environment Statistical Computing Subroutines Engine Pascal Java R functions … Perl Early 1990s, University of Auckland Ross Ihaka, Robert Gentleman ”R” Background History
Major differences between S and R • Syntax • Memory management • Variable scoping • S has developed into S-plus, a commercially available software from Tibco • R is an open source freeware, with contributed packages from researchers worldwide • Recently, XLSolutions is developing R-plus, the commercial version of R Background History
Mouse-click menus Mouse-click shortcuts Command line to interact with R Starting R in Windows Environment Windows
Some keyboard shortcuts for the Windows platform: • Esc: cancels current line of execution (useful when running into trouble) • Ctr-p or arrow up: previous command • Ctr-n or arrow down: next command • Ctr-u: erase line • Ctr-a or ‘home’: beginning of line • Ctr-e or ‘end’: end of line • Ctr-c: copy highlighted text • Ctr-v: paste • Ctr-x: copy and paste highlighted text • Ctr-l: clear command line window • Ctr-z or q(): quit Environment Windows
Command line to interact with R Starting R in Unix Environment Unix
Some keyboard shortcuts for Unix platform: • Esc or Ctr-c: cancels current line of execution (useful when running into trouble) • Ctr-p or arrow up: previous command • Ctr-n or arrow down: next command • Ctr-u: erase line • Ctr-a: beginning of line • Ctr-e: end of line • Ctr-z: send to background (type fg to bring back R) • Ctr-l: clear command line window • Ctr-r : reverse search command history • q(): quit session Environment Unix
R has an interpretative environment • Everything you type on the command line followed by ‘enter’ will be sent to R’s internal engine. R performs the following steps: • Interprets what you have typed • Evaluates it • Returns a result (possibly an error message) • The only exception when R sees a comment. R does not interpret anything after the pound sign # Environment R interpretor
Object-oriented programming is a natural way to classify and modularize “things” of interest in order to interact with them during program execution. • For example, suppose in our program there are 3 shapes: • Circle • Square • Triangle • Initialization • We want to be able to create different shapes of different sizes • Interaction • We want each shape to be able to report to us its area • We want each shape to be able to display itself Object-oriented Concept Intuition
Class: Shape • Type: Triangle • (Isosceles) • Functions: • Report area • Draw • Attributes: • Name ID • Base b • Height h • Class: Shape • Type: Square • Functions: • Report area • Draw • Attributes: • Name ID • Width w • Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Internally in a program: Object-oriented Concept Intuition
Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 12π= 3.14159… Radius: r = 1 Name: ID = circle1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition
Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 22π= 12.566… Radius: r = 2 Name: ID = circle2 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition
Class: Shape • Type: Square • Functions: • Report area • Draw • Attributes: • Name ID • Width w Tell me the area 12= 1 Width: w = 1 Name: ID = square1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition
Class: Shape • Type: Triangle • (Isosceles) • Functions: • Report area • Draw • Attributes: • Name ID • Base b • Height h Tell me the area 1(0.866)/2 = 0.433 base: b = 1 Height: h = 0.866 Name: ID = tri1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition
Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 12π= 3.14159… Radius: r = 1 Name: ID = circle1 area(circle1) Interact Interact Initialize Draw circle1 = Circle(r=1) draw(circle1) Translating to sensible commands: Object-oriented Concept Intuition
Programming commands • circle2 = Circle(radius=2) • area(circle2) • draw(circle2) • square1 = Square(w=1) • area(square1) • draw(square1) • tri1 = Triangle(b=1, h=0.866) • area(tri1) • draw(tri1) Object-oriented Concept Intuition
What does this have to do with R? • R is inherently object-oriented. • R has a set of pre-defined objects that we can interact with them • There are tons of objects inside various packages in R online repository for us to perform various tasks • We can also write our own R objects that perform analysis to our needs • The way we interact with R is very similar to the way we interacted with the program with 3 shapes Object-oriented Concept In relation to R
Primitive data objects • Comes with all R installations • Integers: -3, -2, 1, 2, 3, 1e+10, … • Doubles: 0.789, 3.14, 1.68, 2.9e-6, … • Complex numbers: 3i+7, 2i+3, … • Characters: “a”, “zZ”, “I hope you are still awake”,… • Constants: pi • Logical symbols: TRUE, FALSE • The empty object: NULL • Missing value: NA • Infinity: Inf • Some others Common Data Structures Primitive data objects
Primitive operators • arithmetic: +, -, *, / • modular: %% • matrix multiply: %*% • power: ^ • logical and/or: &, | • relation: <, <=, >, >=, ==, != • assignment: =, <- Common Data Structures Primitive operators
R function calls have the form: • functionName(arg1, arg2, …) • Primitive functions • square-root: sqrt(arg) • exponential: exp(arg) • natural log: log(arg) • length of object: length(arg) • sum of elements in object: sum(obj) • concatenate objects: c(arg1, arg2, …) • round down to nearest integer: floor(arg) • round up to nearest integer: ceiling(arg) • many many others Common Data Structures Primitive functions
Examples of valid expressions • 1 • “a” • ‘a’ • 1 & TRUE • TRUE == FALSE • TRUE != FALSE • 2 > 3 • 1 + 2 + 3 + 4 • 2^3 • a = 4; b = 2^a • log(37) Common Data Structures Simple valid expressions
Examples of invalid expressions • lala # variable not assigned • sqrt(25, 4) # too many arguments • log(1 2) # invalid argument • 1 = “a” # cannot assign value to primitive numeric • TRUE = 3 # cannot assign value to primitive logical Common Data Structures Simple invalid expressions
Vectors • R vectors are column vectors, even though they are displayed horizontally in R • c(object1, object2, …, objectN) • c stands for: concatenate object1, object2, …, objectN Common Data Structures and Constructs vectors
Examples of vectors: • c(1, 2, 3, 4) # numeric vector, (1, 2, 3, 4) • c(1:4) # same as above • c(1, “a”) # mixture of object types • c(c(1:3),c(7:10)) # (1, 2, 3, 7, 8, 9, 10) • c(TRUE, FALSE) # logical vector Common Data Structures and Constructs vectors
Other ways to form vectors: • seq(start, end, by increment) • seq(1, 10, 1) # equivalent to c(1:10) • seq(10, 1, -1) # equivalent to c(10:1) • rep(object, repeat) • rep(1, 10) # a vector of 10 1’s • rep(c(1, 2), 10) # a vector of 1 2 1 2 … Common Data Structures and Constructs vectors
Accessing vector elements • vector[start index:end index] • v = c(1, 2, 3, 4) # assigns v • c(1, 2, 3, 4)[1] # returns 1 • c(1, 2, 3, 4)[2:4] # returns (2, 3, 4) • c(1, 2, 3, 4)[-1] # removes 1st element, returns (2, 3, 4) • c(1, 2, 3, 4)[c(1, 3)] # returns (1, 3) Common Data Structures and Constructs vectors
Matrices • R matrices are objects internally represented as vectors, with 2 additional attributes: • number of rows • number of columns • matrix(c(object1, object2, …, objectN), nrow = I, ncol = J) Common Data Structures and Constructs matrices
Examples of matrices: • matrix(c(1:12), nrow=4, ncol=3) • matrix(c(1:12), 4, 3) # same as above • matrix(c(1:12), nrow=4) # same as above • matrix(c(1:12), ncol=3) # same as above • matrix(c(1:12), 4, 2) # invalid • Other ways to form matrices: • diag(1, 10) # 10x10 identity matrix • diag(“a”, 10) # 10x10 matrix with diagonal of “a” • diag(c(1:10), 10) # 10x10 matrix with diagonal # entries 1, 2, …, 10 Common Data Structures and Constructs matrices
Accessing matrix elements • matrix[(accessing row vectors), (accessing column vectors)] • A = matrix(c(1:9), 3, 3) # assign matrix to variable name A • A[1, 1] # returns 1st row 1st element • A[1, ] # returns row 1 • A[, 1] # returns column 1 • A[, 1:2] # returns column 1, 2 • A[1:5] # returns (1, 2, 3, 4, 5) Common Data Structures and Constructs matrices
Matrix manipulation • Adding a row • rbind(matrix object, vector object) • Adding a column • cbind(matrix object, vector object) • Examples: • A = matrix(c(1:9), 3 , 3) • cbind(matrix, c(10:12)) # add (10, 11, 12) as last # column • cbind(A[,1], c(10:12), A[,2:3]) # add (10, 11, 12) as # 2nd column Common Data Structures and Constructs matrices
Matrix operation • Matrix operations on matrices A, B of conforming dimensions • Addition: A + B • Subtraction: A - B • Multiplication: A %*% B • Inverse: solve(A) • Transpose: t(A) • Determinant: det(A) Common Data Structures and Constructs matrices
Lists • Traditionally vectors and matrices contain simple data objects, mostly primitive data objects. More complex data structures are stored in lists. • lists contain objects and their assigned names: • list(name1=object1, name2=object2, …) • Example of a list: • list(foo=“hello”, bar=“world”) Common Data Structures and Constructs lists
Accessing elements in a list: • We can reference objects in lists by their names with the dollar “$” operator: • alist = list(Friday=“happy”, Monday=“urrr”) • alist$Friday # returns “happy” • alist$Monday # returns “urrr” • If no object in the list contains the name following $, then NULL is returned: • alist$Tuesday # returns NULL • We can also access objects in lists by their index with double bracket [[index]]: • alist[[1]] # returns “happy” • alist[[2]] # returns “urrr” Common Data Structures and Constructs lists
Operating on R objects • R operations are vector-based • When the left hand side (LHS) and right hand side (RHS) of an operator conform, elements on LHS of an operator interact with elements on RHS • Examples • c(1, 2) + c(3, 4) # returns (4, 6) • c(1, 2) + c(3, 4, 5, 6) # returns (4, 6, 6, 8) # (1, 2) is added to (3, 4) and (5, 6) • 2^c(1, 2, 3, 4) # returns (2, 4, 8, 16) • c(1, 2)^c(1, 2, 3, 4) # returns (1, 4, 1, 16) Operations operating on R objects
Operating on R objects • Most of the built-in R objects can report their dimensions. • Examples: • length(c(1:4)) # return 4 • length(list(a=1, b=2)) # return 2 • length(matrix(c(1:12),4,3)) # return 12 • nrow(matrix(c(1:12),4,3)) # returns 4 • ncol(matrix(c(1:12),4,3)) # returns 3 Operations operating on R objects
Logical Expressions • Logical expression is an expression which evaluates to TRUE or FALSE • Logical expressions can be formed by the relation operators • equal: == • not equal: != • less than < • greater than > • less than or equal to: <= • greater than or equal to: >= • Examples: • 0 < 1 # evaluates to TRUE • 0 > 1 # evaluates to FALSE • “A” == “a” # evaluates to FALSE Control Blocks Logical expressions
if-else statement • if (logical expression) { … } else { … } • { … } can be a single expression, or a group of expressions and statements, including another if-else statement. • The else part of the statement is optional. • Examples: • if (0 < 1) “true” • if (0 > 1) “should not see anything” • if (“a” == “A”) { “not equal” } else { “equal” } • if (FALSE) { “nothing” } else if (TRUE) { “something” } Control Blocks if-else statement
While loop • while (logical expression) { … } • { … } (the “body” of the statement) can be a single expression, or a group of expressions. • while statement loops inside { … } until the logical expression evaluates to FALSE. • Example: • while (TRUE) { “never ends!!” } • while (FALSE) { “never executed!!” } • x=1; while (x==1) { print(x); x=2 } # prints 1, then # assign x to 2 Control Blocks while loop
For loop • for (index in start:end) { … } • { … } (the “body” of the statement) can be a single expression, or a group of expressions or statements. • for statement loops in { … } until index exceeds end • Example: • for (i in 1:10) { print(i); } Control Blocks for loop
Read/Write Data • Importing and Exporting data in R is relatively painless. • We can easily import/export files where: • data points are separated by commas • data points are separated by tabs or spaces • data points are separated by some other delimiter. • Read SAS/SPSS/Stata data • Package “foreign” contains functions that allow you to read, among others, SAS/SPSS/Stata data. • type: install.packages(“foreign”), select a location to download package, the rest is automatic • type: library(foreign) to load the package • type: help(package = foreign) to see a list of functions Read/Write Data
Example of reading a file # reads a file, data points separated by spaces or tabs # assign first column to y, second column to x1, third column to x2 file = “http://www-personal.umich.edu/~jktc/R/samples/simple.dat” read.table(file, col.names=c(“y”, “x1”, “x2”)) # specify missing data in file read.table(file, na.strings= “.”) # if first row of data file has header (names for each column) file2 = http://www-personal.umich.edu/~jktc/R/samples/simple.header.dat read.table(file2, header=TRUE) # to see more details of read.table function help(read.table) Read/Write Data Reading from a file
Example of writing to a file data = matrix(c(1:9), 3, 3) # write a space separated file. # assign first column to y, second column to x1 # third column to x2 write.table(data, file=“c:/temp/simple.dat”, row.names=FALSE, col.names=c(“y”, “x1”, “x2”), sep=““) # to see more details on write.table function help(write.table) Read/Write Data Writing to a file