240 likes | 384 Views
Overall Aims. Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R. Books. The R Book – Crawley (2007) Introductions to statistics using R Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R .
E N D
Overall Aims • Introduce programming concepts relevant to MX • Demonstrate the strengths (and weaknesses) of R
Books • The R Book – Crawley (2007) • Introductions to statistics using R • Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. • Crawley M. (2005). Statistics: An Introduction using R. • Dalgaard P. (2002). Introductory Statistics with R. • Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach. • Books on biological topics • Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. • Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. • Bolker B.M. (2008). Ecological Models and Data in R. • Books on statistical topics • Aitkin M. et al. (2009). Statistical Modelling in R. • Faraway J. (2009). Linear Models with R. • Albert J. (2009). Bayesian Computation with R. • Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. • Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R. • Books on R specifics and R programming • Spector P. (2008). Data Manipulation with R. • Murrell P. (2006). R Graphics. • Chambers J. M. (2008). Software for Data Analysis: Programming with R.
Websites • Websites: • Cran R: http://www.r-project.org/ • R cookbook: http://www.r-cookbook.com/ • R graphics: http://addictedtor.free.fr/graphiques/ • R wiki: http://wiki.r-project.org/ • Mailing lists: http://www.r-project.org/mail.html • R seek: http://www.rseek.org/ • Websites on statistical topics • R genetics: http://rgenetics.org/trac/rgalaxy • Bioconductor: http://www.bioconductor.org/
The console • Load up R • Console window appears, with a command prompt • Everything in the R console can be partitioned into two fundamental operations: • Input variables • > x <- 2 • Output variables • > x • [1] 2
Objects • Names • Case sensitive, no spaces • Must begin with a letter but also can contain numbers and: . _ • Try to give your objects meaningful names • > My_f4vourite.langua6e_evR <- “R” • x,y and My_f4v… are objects that we have created • > ls() # this will bring up a list of all our objects • > rm(y) # this deletes y (forever) • > rm(list=ls()) # this deletes everything (..forever)
Workspace 1 • Everything shown in this list of objects comprises our 'workspace' • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ > save.image(file=“myworkspace.RData”) • > rm(list=ls()) • > ls() • character(0) • > load(file = “myworkspace.RData”) • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ • Objects are internal to R • Does not behave like a file structure on the computer • Can't be read or interpreted outside R (?)
Workspace 2 • You can select which objects to save > save(y, x, file = “two_objects.RData”) • Different computer folders can be accessed > dir() # shows current work directory > setwd(“~/work_directory”) # sets R's focus to a different computer folder
Built-in functions • Native functions make R succinct • Diverse range available from graphics to data manipulation to statistical algorithms etc. • Highly optimised so use them if they are available instead of writing your own • Function structure: > function_name(<argument 1>, <argument 2>, …)
Missing values • NA is a “reserved” word in R • It is a single element (length 1) that indicates a missing value • A helpful alternative to coding missing values (e.g -99) > my_array <- c(NA,100,120,120,120,130,NA) > sum(my_array) [1] NA > sum(my_array,na.rm=T) # most functions allow you to explicitly state how to handle NA [1] 590 > table(my_array) # HOWEVER the default action varies from function to function my_array 100 120 130 1 3 1
R help pages • Each function has its own unique syntax • Default arguments • Data structure requirements • Output options • > ?seq # brings up help page of seq() function • > ??”sequence” # searches for all related functions • Note • > seq(from = 2, to = 100, by = 2) • is clearer than • > seq(2,100,2)
Basic Scripting • Note pad / text editor • Within the R GUI • Open with: File > New Script or Ctrl+N • Layout as tile is useful: Windows > Tile
Basic Scripting • Note pad / text editor • Useful for keeping all work together • Scripts can be saved • Can be used to save a “program” • Add # comments • Check individual bits of code • Ctrl+R • Whole line • Selected code
Basic Scripting • Brackets • ( ) functions • [ ] subsets • { } processes • Subsets • Take a subset of an object • Objects have either 1 x n, or m x n dimensions > x [1] 2 5 6 2 6 77 55 > x[5] [1] 6 > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 [rows, columns] > X[3,4] [1] 12
Basic Scripting • Data input • Direct input into the console • scan() • Reading in data • read.table / read.csv • “name.txt” • “c:\\temp\\name.txt” • choose.file() • list.files() • dir() > y <- scan() 1: 3 2: 4 3: 12 4: 3 5: 5 6: 2 7: 14 8: Read 7 items > dir() [1] "temp.csv" "temp2.csv" “name.txt” > y <- read.table("name.txt", header=T, sep="\t") >
Basic Scripting • Data output • Direct input into the console • sink() • Writing out data • write.table/ write.csv • “name.txt” • “c:\\temp\\name.txt” sink(“sink_tmp.txt”) i <- 1:10 outer(i, i, "*") sink() > dir() [1] "temp.csv" "temp2.csv" “name.txt” > write.table("name.txt", header=T, sep="\t") >
Basic Scripting • Adding rows and columns • Allows objects to be joined, either to an existing object or to make a new object • cbind() – adds columns together • rbind() – adds rows together > y3 <- cbind(y1, y2) > y3 [,1] [,2] [,3] [,4] [1,] 1 3 12.5 0.349 [2,] 1 2 13.8 0.745 [3,] 1 5 15.3 0.684 [4,] 1 4 16.8 0.964 > y3 <- rbind(y1, y2[1:3]) > y3 [,1] [,2] [,3] [1,] 1.000 3.000 12.500 [2,] 1.000 2.000 13.800 [3,] 1.000 5.000 15.300 [4,] 1.000 4.000 16.800 [5,] 0.349 0.745 0.684 > y1 [,1] [,2] [,3] [1,] 1 3 12.5 [2,] 1 2 13.8 [3,] 1 5 15.3 [4,] 1 4 16.8 > y2 [,1] [1,] 0.349 [2,] 0.745 [3,] 0.684 [4,] 0.964
Basic Scripting • for loops • loop through a set of commands a given number of times • very useful, but are not optimal for memory > dim(y) [1] 10 10 > for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) } > y_mean [1] 0.1974492 > out <- array(0, c(ncol(y), 1)) • > for(i in 1:ncol(y)) { • out[i] <- mean(y[i, ]) • } • > out • [,1] • [1,] -0.3110800 • [2,] -0.2000344 • [3,] 0.2019573 • [4,] 0.2859823 • [5,] 0.1932523 • [6,] 0.2759323 • [7,] -0.2571102 • [8,] -0.1037983 • [9,] 0.3522018 • [10,] 0.1974492
Data Manipulation • Check data • dim() • mydata[1:10, 1:10] • str() • summary() • head() • tail() • table() • etc… > mydata <- read.table("mydata.txt", header=T, sep="\t") > dim(mydata) [1] 642 1470 > mydata[1:10, 1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 2 1 2 1 2 0 1 0 1 [2,] 0 0 2 2 0 0 1 2 1 2 [3,] 0 2 2 2 1 1 0 0 2 1 [4,] 2 0 2 2 2 0 1 2 0 1 [5,] 2 0 0 2 0 1 1 0 2 0 [6,] 2 1 2 1 1 0 2 2 1 1 [7,] 1 1 2 2 1 2 2 2 0 1 [8,] 0 1 0 0 0 1 1 1 1 1 [9,] 0 0 1 2 1 2 2 0 0 1 [10,] 1 0 1 1 2 0 1 0 0 1
Data Manipulation • Reordering • If you have a data.frame or matrix (numbers or letters) • Use: order() • index <- order(old[,1], decreasing=T) > dim(lamb) [1] 1600 5 > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 A 25.52592 1 1 M 4 A 25.56016 1 1 M 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F > lamb <- lamb[order(lamb$sex, decreasing=F), ] > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F 9 A 30.37944 2 1 F 10 A 25.93680 2 1 F
Data Manipulation • Reordering • order() > lamb <- lamb[order(lamb$sex, decreasing=F), ] > rows <- order(lamb$sex, decreasing=F) > lamb <- lamb[rows, ] Expanded way > index <- order(lamb$sex, decreasing=F) > head(index) [1] 1 2 5 6 9 10 > lamb <- lamb[index, ]
Data Manipulation Replacing index which() > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,1]==“A” > head(index) [1] TRUE TRUE FALSE TRUE FALSE > lamb[index, 1] <- ”C” > head(lamb) Field Weight sire dam sex 1 C 22.92368 1 1 F 2 C 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- which(lamb[,1]=="A") > head(index) 1 2 4 6 7 10 > lamb[index, 1] <- ”C” Put it together > lamb[which(lamb[,1]==”A”, 1] <- ”C”
Data Manipulation • Replacing > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,2] <= 22.000 > table(index) index FALSE TRUE 1553 47 > lamb[index, 2] <- ”NA” > which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126 > which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 > new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ] > new_lamb Field Weight sire dam sex 214 A 2046 27 2 F 363 A 2008 46 1 M 496 A 2041 62 2 M
Graphics with R: Overview • Why graphics? • Why graphics in R? • The R graphics systems (did you really expect just one?) • Graphics basics and examples • Customisation of a graphic • Overview of different systems and packages Introduction to R: Joseph Powell
plot(x, y, …) > ?Formaldehyde > head(Formaldehyde) carboptden 1 0.1 0.086 2 0.3 0.269 3 0.5 0.446 4 0.6 0.538 5 0.7 0.626 6 0.9 0.782 > plot(Formaldehyde) > ?par Introduction to R: Joseph Powell