Overall Aims

Overall Aims • Introduce programming concepts relevant to MX • Demonstrate the strengths (and weaknesses) of R

Books • The R Book – Crawley (2007) • Introductions to statistics using R • Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. • Crawley M. (2005). Statistics: An Introduction using R. • Dalgaard P. (2002). Introductory Statistics with R. • Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach. • Books on biological topics • Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. • Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. • Bolker B.M. (2008). Ecological Models and Data in R. • Books on statistical topics • Aitkin M. et al. (2009). Statistical Modelling in R. • Faraway J. (2009). Linear Models with R. • Albert J. (2009). Bayesian Computation with R. • Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. • Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R. • Books on R specifics and R programming • Spector P. (2008). Data Manipulation with R. • Murrell P. (2006). R Graphics. • Chambers J. M. (2008). Software for Data Analysis: Programming with R.

Websites • Websites: • Cran R: http://www.r-project.org/ • R cookbook: http://www.r-cookbook.com/ • R graphics: http://addictedtor.free.fr/graphiques/ • R wiki: http://wiki.r-project.org/ • Mailing lists: http://www.r-project.org/mail.html • R seek: http://www.rseek.org/ • Websites on statistical topics • R genetics: http://rgenetics.org/trac/rgalaxy • Bioconductor: http://www.bioconductor.org/

The console • Load up R • Console window appears, with a command prompt • Everything in the R console can be partitioned into two fundamental operations: • Input variables • > x <- 2 • Output variables • > x • [1] 2

Objects • Names • Case sensitive, no spaces • Must begin with a letter but also can contain numbers and: . _ • Try to give your objects meaningful names • > My_f4vourite.langua6e_evR <- “R” • x,y and My_f4v… are objects that we have created • > ls() # this will bring up a list of all our objects • > rm(y) # this deletes y (forever) • > rm(list=ls()) # this deletes everything (..forever)

Workspace 1 • Everything shown in this list of objects comprises our 'workspace' • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ > save.image(file=“myworkspace.RData”) • > rm(list=ls()) • > ls() • character(0) • > load(file = “myworkspace.RData”) • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ • Objects are internal to R • Does not behave like a file structure on the computer • Can't be read or interpreted outside R (?)

Workspace 2 • You can select which objects to save > save(y, x, file = “two_objects.RData”) • Different computer folders can be accessed > dir() # shows current work directory > setwd(“~/work_directory”) # sets R's focus to a different computer folder

Built-in functions • Native functions make R succinct • Diverse range available from graphics to data manipulation to statistical algorithms etc. • Highly optimised so use them if they are available instead of writing your own • Function structure: > function_name(<argument 1>, <argument 2>, …)

Missing values • NA is a “reserved” word in R • It is a single element (length 1) that indicates a missing value • A helpful alternative to coding missing values (e.g -99) > my_array <- c(NA,100,120,120,120,130,NA) > sum(my_array) [1] NA > sum(my_array,na.rm=T) # most functions allow you to explicitly state how to handle NA [1] 590 > table(my_array) # HOWEVER the default action varies from function to function my_array 100 120 130 1 3 1

R help pages • Each function has its own unique syntax • Default arguments • Data structure requirements • Output options • > ?seq # brings up help page of seq() function • > ??”sequence” # searches for all related functions • Note • > seq(from = 2, to = 100, by = 2) • is clearer than • > seq(2,100,2)

Basic Scripting • Note pad / text editor • Within the R GUI • Open with: File > New Script or Ctrl+N • Layout as tile is useful: Windows > Tile

Basic Scripting • Note pad / text editor • Useful for keeping all work together • Scripts can be saved • Can be used to save a “program” • Add # comments • Check individual bits of code • Ctrl+R • Whole line • Selected code

Basic Scripting • Brackets • ( ) functions • [ ] subsets • { } processes • Subsets • Take a subset of an object • Objects have either 1 x n, or m x n dimensions > x [1] 2 5 6 2 6 77 55 > x[5] [1] 6 > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 [rows, columns] > X[3,4] [1] 12

Basic Scripting • Data input • Direct input into the console • scan() • Reading in data • read.table / read.csv • “name.txt” • “c:\\temp\\name.txt” • choose.file() • list.files() • dir() > y <- scan() 1: 3 2: 4 3: 12 4: 3 5: 5 6: 2 7: 14 8: Read 7 items > dir() [1] "temp.csv" "temp2.csv" “name.txt” > y <- read.table("name.txt", header=T, sep="\t") >

Basic Scripting • Data output • Direct input into the console • sink() • Writing out data • write.table/ write.csv • “name.txt” • “c:\\temp\\name.txt” sink(“sink_tmp.txt”) i <- 1:10 outer(i, i, "*") sink() > dir() [1] "temp.csv" "temp2.csv" “name.txt” > write.table("name.txt", header=T, sep="\t") >

Basic Scripting • Adding rows and columns • Allows objects to be joined, either to an existing object or to make a new object • cbind() – adds columns together • rbind() – adds rows together > y3 <- cbind(y1, y2) > y3 [,1] [,2] [,3] [,4] [1,] 1 3 12.5 0.349 [2,] 1 2 13.8 0.745 [3,] 1 5 15.3 0.684 [4,] 1 4 16.8 0.964 > y3 <- rbind(y1, y2[1:3]) > y3 [,1] [,2] [,3] [1,] 1.000 3.000 12.500 [2,] 1.000 2.000 13.800 [3,] 1.000 5.000 15.300 [4,] 1.000 4.000 16.800 [5,] 0.349 0.745 0.684 > y1 [,1] [,2] [,3] [1,] 1 3 12.5 [2,] 1 2 13.8 [3,] 1 5 15.3 [4,] 1 4 16.8 > y2 [,1] [1,] 0.349 [2,] 0.745 [3,] 0.684 [4,] 0.964

Basic Scripting • for loops • loop through a set of commands a given number of times • very useful, but are not optimal for memory > dim(y) [1] 10 10 > for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) } > y_mean [1] 0.1974492 > out <- array(0, c(ncol(y), 1)) • > for(i in 1:ncol(y)) { • out[i] <- mean(y[i, ]) • } • > out • [,1] • [1,] -0.3110800 • [2,] -0.2000344 • [3,] 0.2019573 • [4,] 0.2859823 • [5,] 0.1932523 • [6,] 0.2759323 • [7,] -0.2571102 • [8,] -0.1037983 • [9,] 0.3522018 • [10,] 0.1974492

Data Manipulation • Check data • dim() • mydata[1:10, 1:10] • str() • summary() • head() • tail() • table() • etc… > mydata <- read.table("mydata.txt", header=T, sep="\t") > dim(mydata) [1] 642 1470 > mydata[1:10, 1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 2 1 2 1 2 0 1 0 1 [2,] 0 0 2 2 0 0 1 2 1 2 [3,] 0 2 2 2 1 1 0 0 2 1 [4,] 2 0 2 2 2 0 1 2 0 1 [5,] 2 0 0 2 0 1 1 0 2 0 [6,] 2 1 2 1 1 0 2 2 1 1 [7,] 1 1 2 2 1 2 2 2 0 1 [8,] 0 1 0 0 0 1 1 1 1 1 [9,] 0 0 1 2 1 2 2 0 0 1 [10,] 1 0 1 1 2 0 1 0 0 1

Data Manipulation • Reordering • If you have a data.frame or matrix (numbers or letters) • Use: order() • index <- order(old[,1], decreasing=T) > dim(lamb) [1] 1600 5 > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 A 25.52592 1 1 M 4 A 25.56016 1 1 M 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F > lamb <- lamb[order(lamb$sex, decreasing=F), ] > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F 9 A 30.37944 2 1 F 10 A 25.93680 2 1 F

Data Manipulation • Reordering • order() > lamb <- lamb[order(lamb$sex, decreasing=F), ] > rows <- order(lamb$sex, decreasing=F) > lamb <- lamb[rows, ] Expanded way > index <- order(lamb$sex, decreasing=F) > head(index) [1] 1 2 5 6 9 10 > lamb <- lamb[index, ]

Data Manipulation Replacing index which() > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,1]==“A” > head(index) [1] TRUE TRUE FALSE TRUE FALSE > lamb[index, 1] <- ”C” > head(lamb) Field Weight sire dam sex 1 C 22.92368 1 1 F 2 C 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- which(lamb[,1]=="A") > head(index) 1 2 4 6 7 10 > lamb[index, 1] <- ”C” Put it together > lamb[which(lamb[,1]==”A”, 1] <- ”C”

Data Manipulation • Replacing > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,2] <= 22.000 > table(index) index FALSE TRUE 1553 47 > lamb[index, 2] <- ”NA” > which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126 > which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 > new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ] > new_lamb Field Weight sire dam sex 214 A 2046 27 2 F 363 A 2008 46 1 M 496 A 2041 62 2 M

Graphics with R: Overview • Why graphics? • Why graphics in R? • The R graphics systems (did you really expect just one?) • Graphics basics and examples • Customisation of a graphic • Overview of different systems and packages Introduction to R: Joseph Powell

plot(x, y, …) > ?Formaldehyde > head(Formaldehyde) carboptden 1 0.1 0.086 2 0.3 0.269 3 0.5 0.446 4 0.6 0.538 5 0.7 0.626 6 0.9 0.782 > plot(Formaldehyde) > ?par Introduction to R: Joseph Powell

Overall Aims

Overall Aims

Presentation Transcript

Overall Course Aims

Aims

HOME BASE CARE TRAINING WORKSHOP OVERALL AIMS AND OBJECTIVES

AIMS

Aims

Overall Aims

Aims

Aims

Aims

Aims

Overall

E.coli systems and recombination: Determinants of diversity: Overall aims ML

Overall:

Aims

Overall

Aims

Aims

\\ aims

Aims

The overall aims Develop your speaking and listening skills

Aims