520 likes | 696 Views
Introduction to R Workshop By Stephen O. Opiyo. MCIC-Computational Biology Laboratory Day 1. Outline. R workshop outline (http://kh-tps.osu.edu/ Routline.pdf) Introduction to R (Day 1) Graphics (data visualization) in R (Day 2) Basic Statistics using R (Day 2). Getting Started.
E N D
Introduction to R WorkshopBy Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1
Outline • R workshop outline (http://kh-tps.osu.edu/Routline.pdf) • Introduction to R (Day 1) • Graphics (data visualization) in R (Day 2) • Basic Statistics using R (Day 2)
Getting Started • Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RWindows.pdf http://kh-tps.osu.edu/RMac.pdf Open R on your Windows or Mac
History of R • Idea of R came from S developed at Bell Labs since 1976 • S intended to support research and data analysis projects • S to S-Plus licensed to Insightful (“S-Plus”) • R: Open source platform similar to S developed by Robert Gentleman and Ross Ihaka (U of Auckland, NZ) during the 1990s. Since 1997: international “R-core” developing team • Updated versions available every two monthshttp://www.r-project.org/
What is R? • Data handling and storage: numeric, textual • Matrix algebra • Tables and regular expressions • Graphics • Data analysis
R is not • R is not • a database • a collection of “black boxes” • a spreadsheet software package • commercially supported
Useful reading materials • R for Beginners • http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf • An Introduction to R” by Longhow Lam • http://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf • Practical Regression and Anova using R • http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf • An R companion to ‘Experimental Design • http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf • The R Guide • http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf • R for Biologists • http://cran.r-project.org/doc/contrib/Martinez-RforBiologistv1.1.pdf
Useful reading materials • Multilevel Modeling in R • http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf • R reference cards • http://cran.r-project.org/doc/contrib/refcard.pdf • http://cran.r-project.org/doc/contrib/Short-refcard.pdf • http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf • R reference card data mining • http://cran.r-project.org/doc/contrib/Short-refcard.pdf • RStudio - Documentation • http://www.rstudio.com/ide/docs/
Useful books • Learning Rstudio for R Statistical Computing by Mark Van Der Loo, Edwin De Jonge Paperback, 126 pages Published December 25th 2012 by Packt Publishing ISBN 1782160604 (ISBN13: 9781782160601) • Getting Started with RStudio By: John Verzani Publisher: O'Reilly Media, Inc. Pub. Date: September 22, 2011 Print ISBN-13: 978-1-4493-0903-9 • R Graphics Cookbook by Winston Chang (Jan 3, 2013) • R For Dummies by Meys, Joris, de Vries • The R Book by Michael J. Crawley
RStudio • RStudiois a free open source integrated development environment forR (http://www.rstudio.com/ide/) • Assume RStudiofor Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RSWindows.pdf http://kh-tps.osu.edu/RSMaC.pdf
Using RStudio Script editor View variables in workspace and history file View help, plots & files; manage packages R console
Datasets and R scripts • Save all the files to R_workshop directory - Day 1 R files (D1_Example_1.R to D1_Example_12.R ) = 12 files - Day 1 data files (D1_Data_1.csv or txt to D1_Data_2.csv or txt) = 4 files - Day 2 R files (D2_Example_1.R to D2_Example_3.R ) = 3files - Day 2 data files (D1_Data_1.csv to D1_Data_3.csv) = 3 files
R: session management • You can enter a command at the command prompt in a console (>). • To quit R, use >q(). • Result is stored in an object using the assignment operator: (<-) or the equal character (=). Test<-2 and Test=2 • Test is an object with a value of 2 • To print (show) the object just enter the name of the object • Test
Naming object in R • Object names cannot contain `strange' symbols like !, +, -, #. • A dot (.) and an underscore (_) are allowed, also a name starting with a dot. • Object names can contain a number but cannot start with a number.(E.g., Example_1, not 1Example_1) • R is case sensitive, X and x are two different objects, as well as temp.1 and temP.1
R_Example_1 • Example1<-c(1,2,3,4,5) Example1 • 1Example<-c(1,2,3,4,5) • Example2<-c("Second", "Example") Example2 • 2Example<-c("Second", "Example") • Temp.1<-c("Second", "Example") • Temp.1 • Temp.1
R: session management • Your R objects are stored in a workspace • Workspace is not saved on disk unless you tell R to do so. • Objects are lost when you close R and not save the objects. • R session are saved in a file .RData.
R: session management • Just checking what the current working directory type getwd() • To list the objects in your workspace: ls() • To remove objects you no longer need: rm() • To remove ALL objects in your workspace: rm(list=ls()) or useRemove all objects • To save your workspace to a file, you may type save.image() or use Save Workspace… in the File menu • The default workspace file is called .RData
Basic data types • Basic data types: Logical, numeric, and character, etc.
Basic data type (Example 2) • Logical (True or False) • Numerical (relating to numbers) • Characters (letters, numerical digits, common punctuation marks (such as "." or "-”), etc.
Vectorsand Matrices • A vector • Ordered collection of data of the same type • Example: last names of all students in this workshop • In R, single number is a vector of length 1 • A matrix • Rectangular table of data of the same type • Example: Mean intensities of all genes measured during a microarray experiment
Vectors (Example 3) • Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] 1.6486586 0.5306283 1.8405496 > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > yz<-y + z [1] 2.0 3.1 4.2 5.3 6.4 > length(y) [1] 5 > M<-mean(y + z) [1] 4.2
Operation on vector elements (Example 4) > Mydata [1] 2 3.5 -0.2 > x4<-Mydata > 0 [1] TRUE TRUE FALSE > x5<-Mydata[Mydata>0] [1] 2 3.5 > x6<-Mydata[-c(1,3)] [1] 3.5 • Test on the elements • Extract the positive elements • Remove elements 1 and 3 • Mydata <- c(2,3.5,-0.2)Vector (c=“concatenate”)
Operation on vector elements (Example 4) > Colors <- c("Red","Green","Red") Character vector > Colors[2] [1] "Green" > x1 <- 25:30 > x1 [1] 25 26 27 28 29 30Number sequences > x2<-x1[3:5] [1] 27 28 29 Various elements 3 to 5 >x3<-x1[c(2,6)] Elements 2 and 6
Operation on vector elements (Example 5) > x <- c(5,-2,3,-7) > y <- c(1,2,3,4)*10Operation on all the elements > y [1] 10 20 30 40 > sort(x)Sorting a vector in ascending order [1] -7 -2 3 5 > sort(x, decreasing=TRUE) Sort in descending order [1] 5 3 -2 -7 > order(x) [1] 4 2 3 1Element order for sorting > y[order(x)] [1] 40 20 30 10Operation on all the components > rev(x)Reverse a vector [1] -7 3 -2 5
Matrices (Example 6) • Matrix: Rectangular table of data of the same type > m <- matrix(1:12) Create matrix using the “matrix function” m [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 [11,] 11 [12,] 12 V<-1:12 V1<-c(1,2,3,4,5,6,7,8,9,10,11,12)
Matrices (Example 6) • Matrix: Rectangular table of data of the same type > mr <- matrix(1:12, 4) four rows mr [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12
Matrices (Example 6) • Matrix: Rectangular table of data of the same type > mm <- matrix(1:12, 4, byrow = T); m By row creation [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12 > tmm<-t(mm) t is transpose [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > dim(mm) dim is dimension [1] 4 3 four rows and three columns > dim(t(mm)) [1] 3 4 three rows and four columns
Matrices (Example 6) Matrix: Rectangular table of data of the same type • > x <- c(3,-1,2,0,-3,6) • > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] • [1,] 3 0 • [2,] -1 -3 • [3,] 2 6 • x.matr <- matrix(x,ncol=2, byrow=T) By row creation • > x.matr • [,1] [,2] • [1,] 3 -1 • [2,] 2 0 • [3,] -3 6
0peration on matrices (Example 6) > x.matr[,2]2ndcol [1] -1 0 6 > x.matr[c(1,3),]1st and 3rd lines [,1] [,2] [1,] 3 -1 [2,] -3 6 > x.mat[-2,] remove second row (No 2nd line) [,1] [,2] [1,] 3 -1 [2,] -3 6
R as a calculator (Example 7) In R, plus (+), minus (-), division(/), and multiplication (*) > 5 + (6 + 7) * pi^2 [1] 133.3049 > log(exp(1)) [1] 1 > sin(pi/3)^2 + cos(pi/3)^2 [1] 1 > Sin(pi/3)^2 + cos(pi/3)^2 Error: couldn't find function "Sin” Take few minutes to do any calculations
Lists • A list is an ordered collection of data of arbitrary types. • Lists don’t need to be of the same mode or type and they can be a numeric vector, a logical value and a function and so on. • A component of a list can be referred as aa[[I]] or aa$x, where aa is the name of the list and x is a name of a component of aa.
Lists (Example 8) • aa[[1]] is the first component of aa, while aa[1] is the sublist consisting of the first component of aa only. • list: an ordered collection of data of arbitrary types. > doe = list(name="john", age=28, married=F) • doe[[1]] • [1] "john" • > doe[1] • $name • [1] "john” • > doe$name • [1] "john“ • > doe$age • [1] 28 • Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.
Lists are very flexible (Example 8) >my.list <- list(c(5,4,-1),c("X1","X2","X3")) >my.list [[1]]: [1] 5 4 -1 [[2]]: [1] "X1" "X2" "X3" >my.list[[1]] [1] 5 4 -1 > my.list1 <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) >my.list1$c2[2:3] [1] "X2" "X3"
Lists: Session (Example 8) Empl <- list(employee=“Anna”, spouse=“Fred”, children=3, child.ages=c(4,7,9)) Empl[[4]] [1] 4 7 9 Empl$child.ages [1] 4 7 9 Empl[4] # a sublist consisting of the 4th component of Empl $child.ages [1] 4 7 9 #letters is a function for a to z #toupper(letters) for capital A to Z #LETTERS for capital A to Z names(Empl) <- letters[1:4] names(Empl) [1] "a" "b" "c" "d" Empl1 <- c(Empl, service=8) $a [1] "Anna" $b [1] "Fred" $c [1] 3 $d [1] 4 7 9 $service [1] 8 unl<-unlist(Empl) # converts it to a vector. Mixed types will be converted to character, giving a character vector . unl a b c d1 d2 d3 service "Anna" "Fred" "3" "4" "7" "9" "8"
More lists (Example 8) > x <- c(3,-1,2,0,-3,6) >x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6 > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) > x.mat R1 R2 L1 3 -1 L2 2 0 L3 -3 6
Data frame Data frame: Rectangulartable with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Exampleof a data frame with 10 rows and 3 columns
Importing Data (Exercise 9) • The easiest way to enter data in R is to work with a text file, in which the columns are separated by tabs; or comma-separated values (csv) files . Example of importing data are provided below. • mydata <- read.table("D1_Data_1.csv", header=TRUE) • mydata_correct <- read.table("D1_Data_1.csv", sep=”, ", header=TRUE) • mydata <- read.csv("D1_Data_1.csv", header=TRUE) • mydatab<- read.table("D1_Data_1.txt", header=TRUE) • Mydatab_correct<- read.table("D1_Data_1.txt", , sep=”,\t", header=TRUE) • mydatab<- read.delim("D1_Data_1.txt", header=TRUE) • Import from Web URL • Data_URL<-read.table(file=http://kh-tps.osu.edu/D1_Data_1.csv, header=TRUE, sep=“,”) • Importing data in Rstudio using (Import Dataset) on the Workspace Importing data from Excel, SAS, SPSS see the link below http://www.statmethods.net/input/importingdata.html
Keyboard Input (Exercise 10) • # create a data frame from scratch • age <- c(25, 30, 56, 49, 12, 16, 60, 34, 45, 22) • gender <- c("male", "female", "male","male", "female", "male","male", "female", "male","male") • weight <- c(160, 110, 220, 100, 65, 120, 179, 134, 165, 153) • mydata<- data.frame(age,gender,weight)
Viewing Data (Exercise 10) There are a number of functions for listing the contents of an object or dataset. # list the variables in mydatanames(mydata) # list the structure of mydatastr(mydata) # dimensions of an objectdim(mydata)
Viewing Data (Example 10) # class of an object (numeric, matrix, dataframe, etc)class(mydata) # print mydata mydata # print first 6rows of mydatahead(mydata) # print first 2 rows of mydatahead(mydata, n=2) print last 6 rows of mydatatail(mydata) # print last 2 rows of mydatatail(mydata, n=2)
Missing Data (Example 11) In R, missing values are represented by the symbol NA (not available) . Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Testing for Missing Values is.na(x) # returns TRUE of x is missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Excluding missing values from analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2
Exporting Data (Example 12) • To A csv File • write.table(mydata, "c: mydata.csv", sep=", ") • To To A Tab Delimited Text File • write.table(mydata, "c: mydata.csv”, , sep=“\t ") • Exporting R objects into other formats . For SPSS, SAS and Stata. you will need to load the foreign packages. • For Excel, you will need the xlsReadWrite package.