1 / 65

An Introduction to R

An Introduction to R. Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University. The R-project. R is a free software. ( www.r-project.org ) The S language. S-Plus (a commercial software)

keegan
Download Presentation

An Introduction to R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to R Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University

  2. The R-project • R is a free software. (www.r-project.org) • The S language. • S-Plus (a commercial software) • R is an integrated software environment for data manipulation, calculation and graphical display. • An efficient data handling and storage facility, • A suite of operators for calculations on arrays, in particular matrices, • A large, coherent, integrated collection of intermediate tools for data analysis, Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  3. Graphical facilities for data analysis and display either directly at the computer or on hardcopy, • A well developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities. • R packages (CRAN) • Standard packages • Other packages available at the Comprehensive R Archive Network (CRAN) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  4. Downloading and Installing R • http://www.r-project.org/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  5. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  6. Starting an R session Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  7. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  8. Working Environment of R Directory 1 Directory 2 Working Directory Workspace Temporary memory • The working environment of R can be illustrated by the following graph: Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  9. Running R • When you first start running R the default prompt is the “>” sign. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  10. Working directory • In using R, you need to know and specify the working directory.This is done by clicking the Change dir button. • One can specify different working directories for different projects. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  11. Getting help • >help(…) and >help.search(“….”) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  12. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  13. Executing commands from an external file • R commands can be stored in an external file (for example, ksc.r) in the working directory. These commands can then be executed with the source command: > source (“ksc.r”) or Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  14. Open and run an existing file Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  15. Objects and Workspace • The entities that R creates and manipulates are known as objects. These may be variables, arrays of numbers, character strings, functions, or more general structures built from such components. • During an R session, objects are created and stored by name. The R command > objects() (alternatively, ls()) can be used to display the names of (most of) the objects which are currently stored within R. • The collection of objects currently stored is called the workspace. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  16. Data permanency and removing objects • To remove objects the function rm is available: > rm(x, y, z, ink, junk, temp, foo, bar) • All objects created during an R sessions can be stored permanently in a file for use in future R sessions. At the end of each R session you are given the opportunity to save all the currently available objects. If you indicate that you want to do this, the objects are written to a file called ‘.RData’ in the current directory, and the command lines used in the session are saved to a file called ‘.Rhistory’. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  17. When R is started at later time from the same directory it reloads the workspace from this file (.RData). At the same time the associated commands history is reloaded. • Remove all objects in the workspace • rm(list=ls()) • Clear the screen • Ctrl l Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  18. Reading data from files • The read.table() function Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  19. The scan() function • The read.csv() function Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  20. Output data to files • write, write.table, write.csv write(x,”output.txt”,ncolumns=10,append=TRUE,sep="\t") write(round(x,digits=2),”output.txt”,ncolumns=10,append=TRUE,sep="\t") Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  21. Objects, their modes and attributes • Intrinsic attributes: mode and length • The entities R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. • These vectors are known as “atomic” structures since their components are all of the same type, or mode, namely numeric, complex, logical, character and raw. By the mode of an object we mean the basic type of its fundamental constituents. This is a special case of a “property” of an object. Another property of every object is its length. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  22. Atomic structures of R • Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw. (The only apparent exception to this rule is the special “value” listed as NA for quantities not available, but in fact there are several types of NA). • Note that a vector can be empty and still have a mode. For example the empty character string vector is listed as character(0) and the empty numeric vector as numeric(0). Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  23. Recursive structures of R • R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. • lists are known as “recursive” rather than atomic structures since their components can themselves be lists in their own right. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  24. The other recursive structures are those of mode function and expression. • Functions are the objects that form part of the R system along with similar user written functions. • Expressions are objects which form an advanced part of R. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  25. An example of using function # AR2_Bootstrap.R # Coded by KSC 08232011 at the University of Bristol ------------- # AR modeling of the flow data series x=read.csv("Nine_flow_events.csv",sep=",") n.event=9 n.bt=1000 # number of bootstrap samples alpha1=c();alpha2=c();alpha3=c();alpha0=c() predct=c() par.ar=matrix(rep(0,n.event*4),ncol=4,nrow=n.event) file.name=paste("event",1:n.event,".txt",sep="") bt.name=paste("bootstrap",1:n.event,".txt",sep="") #------------------------------------------------------------------ # Function -- AR(2) Forecasting forecast=function(obs,par1,par2,par3,predct) { L=length(obs) u1=0;u2=0 obs=c(u1,u2,obs) for (i in 1:L) predct[i]=par3+par1*obs[i+1]+par2*obs[i] err=obs[3:(L+2)]-predct out=c(predct,err) return(out) } #------------------------------------------------------------------ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  26. # AR(2) Modeling, forecasting and bootstrapping of individual series for (i in 1:n.event) { event=x[[i]][!is.na(x[i])] # AR(2) Modeling ------------ windows() pacf(event) ar.event=arima(event,order=c(2,0,0)) alpha1[i]=ar.event[[1]][1] alpha2[i]=ar.event[[1]][2] alpha3[i]=ar.event[[1]][3] alpha0[i]=(1-alpha1[i]-alpha2[i])*alpha3[i] par.ar[i,]=c(alpha0[i],alpha1[i],alpha2[i],alpha3[i]) # # AR(2) Forecasting --------- out.4cast=forecast(event,alpha1[i],alpha2[i],alpha0[i],predct) err=out.4cast[(length(event)+1):(2*length(event))] err.star=err-mean(err) write(event,file.name[i],ncolumns=10,append=TRUE,sep="\t") write(out.4cast[1:length(event)],file.name[i],ncolumns=10,append=TRUE,sep="\t") write(err,file.name[i],ncolumns=10,append=TRUE,sep="\t") # Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  27. # Model-based Time Series Bootstrapping -------------- btsample=matrix(rep(0,n.bt*(2+length(err))),nrow=n.bt,ncol=2+length(err)) for (j in 1:n.bt) { epsilon=sample(err.star,size=length(err),replace=TRUE) for (k in 3:(2+length(err))) { btsample[j,k]=alpha0[i]+alpha1[i]*btsample[j,k-1]+alpha2[i]*btsample[j,k-2]+epsilon[k-2] } write(btsample[j,3:(2+length(err))],bt.name[i],ncolumns=10,append=TRUE,sep="\t") } # # Plot observed and bootstrap sample series windows() z=scan(bt.name[i],sep="\t") plot(0,0,type="n",xlim=c(0,length(event)),ylim=c(min(z),max(z))) dim(z)=c(length(event),n.bt) for (j in 1:n.bt) lines(1:length(event),z[,j],type="l") lines(1:length(event),event,type="l",col="red",lwd=3) } par.ar Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  28. An example using function ecdf # ECDF_Plot.R # Coded by KSC 09242011 ----------------- n.sample=9 # Number of samples in.file=paste("CECP",1:n.sample,".txt",sep="") windows() plot(0,0,type="n",xlim=c(-1,1),ylim=c(0,1)) for (i in 1:n.sample) { x=scan(in.file[i],sep="\t") n.L=length(x) x1=x[1:(n.L/2)] x2=x[(1+(n.L/2)):n.L] x1.ecdf=ecdf(x1);x2.ecdf=ecdf(x2) u=seq(-1,1,by=0.005);v=x1.ecdf(u) lines(u,v,type="l",col=i,lwd=3) v1=round(mean(x1),digits=4) v2=round(sqrt(var(x1)),digits=4) v3=round(mean(x2),digits=4) v4=round(sqrt(var(x2)),digits=4) print(c(v1,v2,v3,v4)) } Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  29. The functions mode(object) and length(object) can be used to find out the mode and length of any defined structure. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  30. Changing the mode of an object • as.character(x) • as.integer(x) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  31. Changing the length of an object • An “empty” object may still have a mode. For example makes e an empty vector structure of mode numeric. • Once an object of any size has been created, new components may be added to it simply by giving it an index value outside its previous range. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  32. Other examples Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  33. The class of an object • All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example "numeric", "logical", "character" or "list", but "matrix", "array", "factor" and "data.frame" are other possible values. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  34. What is an object? • Any entity R operates on is an object. • Vector • Matrix • Array • Dataframe • List • Function • Expression • Mode of objects • Numeric • Complex • Character • Factor • Logical • Data.frame

  35. Manipulating objects • Vector assignment concatenate Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  36. If an expression is used as a complete command, the value is printed and lost. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  37. Vector arithmetic • Vectors can be used in arithmetic expressions, in which case the operations are performed element by element. • Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  38. Example • Arithmetic operations +, -, * , / , ^ (power), round, floor, ceiling • Arithmetic functions • log, exp, sin, cos, tan, sqrt, abs Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  39. Statistical functions • min, max, range, length, sum, mean, median • quantile, var, prod, smmary • sort, order, rank Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  40. Sort y with respect to increasing order of x. Same as sort(x) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  41. Logical vectors • As well as numerical vectors, R allows manipulation of logical quantities. The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”). • Logical vectors are generated by conditions. • The logical operators are <, <=, >, >=, == for exact equality and != for inequality. In addition if c1 and c2 are logical expressions, then c1 & c2 is their intersection (“and”), c1 | c2 is their union (“or”), and !c1 is the negation of c1. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  42. Example Why? Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  43. Missing values • NA – not available • NaN – not a number • The function is.na(x) gives a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA and NaN. • The finction is.nan(x) returns TRUE if and only if the corresponding element is NaN. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  44. Removing the missing values Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  45. Character vectors • Character vectors are used frequently in R, for example as plot labels. Where needed they are denoted by a sequence of characters delimited by the double quote character, e.g., "x-values", "New iteration results". • The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings. Any numbers given among the arguments are coerced into character strings in the evident way, that is, in the same way they would be if they were printed. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  46. The arguments are by default separated in the result by a single blank character, but this can be changed by the named parameter, sep=string, which changes it to string, possibly empty. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  47. Set operations • union(x, y) • intersect(x, y) • setdiff(x, y) • is.element(el, set) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  48. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  49. Selecting and modifying subsets of an object using index vectors • Subsets of a vector may be selected by appending to the name of the vector an index vector in square brackets, v[i]. • Such index vectors can be any of four distinct types: • A logical vector Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

  50. A vector of positive integer quantities • A vector of negative integer quantities Such an index vector specifies the values to be excluded rather than included. Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

More Related