420 likes | 538 Views
Introduction to Contributed Packages in R. Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu. What is R?.
E N D
Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu
What is R? • The R statistical programming language is a free open source package based on the S language developed by Bell Labs. • The language is very powerful for writing programs. • Many statistical functions are already built in. • Contributed packages expand the functionality to cutting edge research. • Since it is a programming language, generating computer code to complete tasks is required.
Getting Started • Where to get R? • Go to www.r-project.org • Downloads: CRAN • Set your Mirror: Anyone in the USA is fine. • Select Windows 95 or later. • Select base. • Select R-2.4.1-win32.exe • The others are if you are a developer and wish to change the source code.
Getting Started • The R GUI?
Getting Started • Opening a script. • This gives you a script window.
Getting Started Submit Selection • Submitting a program: • Use button • Right mouse click and run selection.
Getting Started • Basic assignment and operations. • Arithmetic Operations: • +, -, *, /, ^ are the standard arithmetic operators. • Matrix Arithmetic. • * is element wise multiplication • %*% is matrix multiplication • Assignment • To assign a value to a variable use “<-”
Getting Started • How to use help in R? • R has a very good help system built in. • If you know which function you want help with simply use ?_______ with the function in the blank. • Ex: ?hist. • If you don’t know which function to use, then use help.search(“_______”). • Ex: help.search(“histogram”).
Importing Data • How do we get data into R? • Remember we have no point and click… • First make sure your data is in an easy to read format such as CSV (Comma Separated Values). • Use code: • D <- read.table(“path”,sep=“,”,header=TRUE)
Working with data. • Accessing columns. • D has our data in it…. But you can’t see it directly. • To select a column use D$column.
Working with data. • Subsetting data. • Use a logical operator to do this. • ==, >, <, <=, >=, <> are all logical operators. • Note that the “equals” logical operator is two = signs. • Example: • D[D$Gender == “M”,] • This will return the rows of D where Gender is “M”. • Remember R is case sensitive! • This code does nothing to the original dataset. • D.M <- D[D$Gender == “M”,] gives a dataset with the appropriate rows.
Source Files • Source files allows you to store all of your created functions in a single file and have all those functions available to you. • To load a self created library use: source(Path) • Don’t forget that \ in the path needs to be replaced with \\
Libraries • In order to keep R’s memory footprint small, additional functionality is stored in libraries. • These libraries can be called through the GUI or scripts. • Beware that some contributed packages may conflict with some libraries.
Contributed Packages • Since R is open source and the developers are well organized, developing and finding contributed packages is easy. • Currently there are 964 contributed packages. • These range from wavelets, financial mathematics to spatial data analysis.
Contributed Packages • One popular library is lattice.
Contributed Packages • You can install contributed packages using the GUI.
Contributed Packages • You can install the package by selecting it from the list. • Note: Installing a package does not make it immediately available for use. • You still need to use the library() statement to make the functionality available to you. library(lattice)
Help on contributed packages • Once a contributed package is loaded you can access the help for the package and a list of functions available in the package by: library(help=“lattice”)
The CircStats Package • Many times data may come in a circular format. • For example the direction of migration or flight of birds from their nest. • The data is an angle not a “linear” measurement. • The data can only go between 0 and 2p.
The CircStats Package • Use the CircStats Package. library(CircStats) • Consider the following: data <- runif(50, 0, pi) mean.dir <- circ.mean(data) mean.dir [1] 1.446502
The CircStats Package • Randomly generate data from a Von Mises distribution data.vm <- rvm(100, 0, 3) • Create a plot of it using circ.plot: circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)
The CircStats Package • Regression with circular data: • Create some data data1 <- runif(50, 0, 2*pi) data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5) • Run the regression using circ.reg: circ.lm <- circ.reg(data1, data2, order=1) circ.lm (Intercept) -0.01365604 -0.02939188 cos.alpha -0.29872673 0.41344126 sin.alpha 0.78894271 0.72908521
The CircStats Package • Plot the data plot(data1, data2) • Plot the predicted line circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')
The norm Contributed Package • While the norm package sounds as if it would have something to do with the normal distribution it is in fact a package for dealing with missing data. • It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997). • Similar to SAS PROC MI.
The norm Contributed Package • Load the library. library(norm)
The norm Contributed Package • Generate some data. X1 <- rnorm(100,6,1) X2 <- rnorm(100,10,3) X3 <- rnorm(100,3,.2) X4 <- rnorm(100,31,2) Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)
The norm Contributed Package • Generate some missing data. X1a <- ifelse(runif(100,0,1)<.1,NA,X1) X2a <- ifelse(runif(100,0,1)<.1,NA,X2) • Put the data together. YX <- cbind(Y,X1a,X2a,X3,X4)
The norm Contributed Package • Prep the data and parameters for multiple imputation. #do preliminary manipulations s <- prelim.norm(YX) #find the mle thetahat <- em.norm(s) #set random number generator seed rngseed(1234567)
The norm Contributed Package • Create a list to store the individual results in. betaout <- vector("list",10) betasterrout <- vector("list",10)
The norm Contributed Package • Run a multiple imputation loop for(i in 1:10){ ximp <- imp.norm(s,thetahat,YX) beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients betaout[[i]] <- beta1 betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2] }
The norm Contributed Package • Analyze the results mi.inference(betaout,betasterrout,confidence=0.95)
The norm Contributed Package • Look at the output (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060 $std.err (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 2.70312542 0.13431178 0.04240159 0.65908509 0.05596610 $df (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1318.8371 222.2528 13269.2373 1770.6680 27689.4900 $signif (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01 $r (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 0.09004737 0.25192843 0.02673983 0.07676697 0.01835967
The lpSolve Package • The lpSolve package allows for the solving of linear and integer programs. library(lpSolve)
The lpSolve Package • Consider the following linear program:
The lpSolve Package • Set up the vectors and matrices f.obj <- c(1, 9, 3) f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE) f.dir <- c("<=", "<=") f.rhs <- c(9, 15)
The lpSolve Package • The lp() function will attempt to solve the linear program. lp ("max", f.obj, f.con, f.dir, f.rhs) Success: the objective function is 40.5
The lpSolve Package • To obtain the solution grab the solution from the object. lp("max", f.obj, f.con, f.dir, f.rhs)$solution [1] 0.0 4.5 0.0
The lpSolve Package • Sensitivity analyses can be obtained from the lp() object. • The following are objects attached to an lp() object. [1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval" [9] "solution" "presolve" "compute.sens" "sens.coef.from" [13] "sens.coef.to" "duals" "duals.from" "duals.to" [17] "status"
The lpSolve Package • To solve an integer program specify the vector components for which variables need to be integers lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) Success: the objective function is 37
To obtain the solution to the integer program use the solution statemet as before: lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution [1] 1 4 0 The lpSolve Package
Summary • R is programming environment with many standard programming structures already included. • A large number of contributed packages. • Many packages allow for use of modern statistical procedures with out having to code them yourself. • Requires familiarity with R to actually implement the packages. • No support. • Allows users to create new packages.
Summary • All of the R code and files can be found at: www.people.vcu.edu/~elboone2/CSS.htm