460 likes | 692 Views
R Basics. Xudong Zou Prof. Yundong Wu Dr. Zhiqiang Ye 18 th Dec . 2013. R Basics. History of R language How to use R Data type and Data Structure Data input R programming Summary Case study. History of R language. R obert Gentleman. R oss Ihaka. History of R language.
E N D
R Basics XudongZou Prof. Yundong Wu Dr. Zhiqiang Ye 18thDec. 2013
R Basics • History of R language • How to use R • Data type and Data Structure • Data input • R programming • Summary • Case study
Robert Gentleman Ross Ihaka
2013-09-25: Version: R-3.0.2
What is R? • R is a programming language, and also a environment for statistics analysis and graphics • Why use R • R is open and free. Currently contains 5088 packages that makes R a powerful tool for financial analysis, bioinformatics, social network analysis and natural language process and so on. • More and more people in science tend to learn and use R • # BioConduct: bioinformatics analysis(microarray) • # survival: Survival analysis
How to use R 从这里输入命令 控制台
How to use R 新建或打开R脚本 点这里添加R包 ?用来获取帮助
Data type and Data structure Data type in R : numeric : integer, single float, double float character complex logical Data structure in R:
Vector and vector operation Vector is the simplest data structure in R, which is a single entity containing a collection of numbers, characters, complexes or logical. 注意这个向左的箭头 # Create two vectors: # Check the attributes: # basic operation on vector:
Vector and vector operation # basic operation on vector: > max( vec1) > min (vec1) > mean( vec1) > median(vec1) > sum(vec1) > summary(vec1) > vec1 > vec1[1] > x <- vec1[-1] ; x [1] > vec1[7] <- 15;vec1
array and matrix An array can be considered as a multiply subscripted collection of data entries. > x <- 1:24 > dim( x ) <- c( 4,6) # create a 2D array with 4 rows and 6 columns > dim( x ) <- c(2,3,4) # create a 3D array
array and matrix array() > x <- 1:24 > array( data=x, dim=c(4,6)) > array( x , dim= c(2,3,4) ) array indexing > x <- 1:24 > y <- array( data=x, dim=c(2,3,4)) > y[1,1,1] > y[,,2] > y[,,1:2]
array and matrix Matrix is a specific array that its dimension is 2 > class(potentials) # “matrix” > dim(potentials) # 20 20 > rownames(potentials) # GLY ALA SER … > colnames(potentials) # GLY ALA SER … > min(potentials) # -4.4
list List is an object that containing other objects as its component which can be a numeric vector, a logical value, a character or another list, and so on. And the components of a list do not need to be one type, they can be mixed type. >Lst <- list(drugName="warfarin",no.target=3,price=500, + symb.target=c("geneA","geneB","geneC") >length(Lst) # 4 >attributes(Lst) >names(Lst) >Lst[[1]] >Lst[[“drugName”]] >Lst$drugName
Data Frame A data frame is a list with some restricts: ① the components must be vectors, factors, numeric matrices, lists or other data frame ② Numeric vectors, logicals and factors are included as is, and by default character vectors are coerced to be factors, whose levels are the unique values appearing in the vector ③ Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size Names of components
Data Frame > names(cars) [1] "Plant" "Type" "Treatment" "conc" "uptake“ > length(cars) # 2 > cars[[1]] > cars$speed# recommended > attach(cars) # ?what’s this > detach(cars) > summary(cars$conc) # do what we can do for a vector
Data Input scan(file, what=double(), sep=“”, …) # scan will return a vector with data type the same as the what give. read.table(file, header=FALSE, sep= “ ”, row.names, col.names, …) # read.table will return a data.frame object # my_data.frame<- read.table("MULTIPOT_lu.txt",row.names=1,header=TRUE) From other software # from SPSS and SAS library(Hmisc) mydata <- spss.get(“test.file”,use.value.labels=TRUE) mydata <- sasxport.get(“test.file”) #from Stata and systat library(foreign) mydata<- read.dta(“test.file”) mydata<-read.systat(“test.file”) # from excel library(RODBC) channel <- odbcConnectExcel(“D:/myexcel.xls”) mydata <- sqlFetch(channel, “mysheet”) odbcclose(channel) load package
R Programming Control Statements # repeat {…} # switch( statement, list)
R Programming Function Definition: Example: matrix.axes <- function(data) { x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); }
Summary numeric, character, complex, logical Data type and Data Structure vector, array/matrix, list, data frame scan, read.table Data Input load from other software: SPSS, SAS, excel Operators : <- R Programming:
Case study Residue based Protein-Protein Interaction potential analysis: Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal84(3), p1895-1901
Reference CRAN-Manual:http://cran.r-project.org/ Quick-R:http://www.statmethods.net/index.html R tutorial:http://www.r-tutor.com/ MOAC:http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/matrix_contour/