120 likes | 252 Views
R tutorial. Stat 221, Section 1 Xiaodan Fan (Dan) xfan@fas.harvard.edu. Introduction. S-Plus is an implementation of S language, which was originated in Bell Lab. R is ‘GNU S’, an open-source implementation. Matrix-based programming language with rich statistical features. Accessing R.
E N D
R tutorial Stat 221, Section 1 Xiaodan Fan (Dan) xfan@fas.harvard.edu
Introduction • S-Plus is an implementation of S language, which was originated in Bell Lab. R is ‘GNU S’, an open-source implementation. • Matrix-based programming language with rich statistical features.
Accessing R • “Windows”: free version of R can be downloaded from http://www.r-project.org; Also every machine in FAS lab and stat lab preinstalled R. Just open “R console”. • “Unix”: login into nice.harvard.edu or stat.harvard.edu, type “R” at the Unix prompt. • “Web-based”: copy and paste http://rweb.stat.umn.edu/Rweb/Rweb.general.html • Using q() to quit from R • Get help: help(mean) or help->Html help
Basic Stuff: • Assignment operators: x=c(1,2); y=c(3,4); a=matrix(c(1,2,3,4), nrow=2, ncol=2, byrow=T) • “matrix based”: everything is done as vectors. try x+y;x*y;x%*%y;x%*%t(y);solve(a) • Arithmetic Operators: +,-,*,/,^,%/%,%% • Comparison Operators: ==, !=, >, <, >=, <= • Logical Operators: &&, ||, ! • Comment: #
Input data • From command line: • x = c(1,2,3,4) • Y = 1:100; Y=seq(1,100,1) • new.vector = rep(0, n) • z = matrix(Y, nrow=20, ncol=5) • From data file: • data<-read.table(“dataset.txt”, header=T)
Control Structure • Condition: if(a==5){ b=1 } if(a<=3){ b=2 } else{ b=3 } • Loop: be aware, R is notorious for loops, try to avoid using loops as much as you can! • while (a<3) {a=a+1} • for (i in 1:10){x=x+i} break next
Write your own function SmallerValue <-function(x,y){ if(x<y){ return(x) }else{ return(y) } }
Basic Statistics: • Summmary: summary(vector),mean, var, sd • Tests: t.test(vector), var.test(vector1, vector2) • Sorting: max, min, sort • Graph: hist(vector), boxplot, qqplot, plot • Regression: model<- lm(z~x+y) • Sampling: sample(x, 1000, replace=T, prob=NULL) Ornext slide
Functions about Distributions • The names for various distributions are as follows: norm : normal; beta : beta chisq : chi squared; pois : Poisson gamma : gamma; binom : binomial nbinom: negative binomial; t : t; f : F • Then you can add one of four letters to each distribution to create a command: r: random draws d: density functions (pdf or pmf) q: quantiles p: cdf (cumulative probabilities) • Example: rnorm(10, mean=0, sd=1); dnorm(0, mean=0, sd=1)
Enhance your productivity • Save a series of command as a text file and run it as a program. • Within R: source(“infile.txt”) • Outside R: R CMD BATCH infile.txt outfile.txt • Writing your own functions: • my.function <- function(arguments) { content return(output) }
Other packages: • Functions are grouped as packages • Install packages: Windows menu: packages -> install.. Unix&Windows:install.packages("MASS") • Load packages before use: library(MASS) help(mvrnorm)
Real world examples: • example1: y = seq(-7,10,.02) dens = 0.5*dnorm(y,1,2) + 0.5*dnorm(y,2,2) plot (y, dens, type="l", xlab="y", ylab="") • example2: lambda0 = c(.9,.1) mu0 = c(0,15) tau0 = c(10,25) y = c(28.39, 7.94, -2.75, 6.82, -0.64, 0.63, 18.01, 12.16) sigma = c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6) J = length(y) M = length(lambda0) lambda = array (NA, c(J,M)) mu = array (NA, c(J,M)) tau = array (NA, c(J,M)) for (m in 1:M){ lambda[,m] = lambda0[m]* dnorm(y,mu0[m],sqrt(sigma^2+tau0[m])) tau[,m] = sqrt(1/(1/tau0[m]^2 + 1/sigma^2)) mu[,m] = (mu0[m]/tau0[m]^2 + y/sigma^2)*tau[,m]^2} for (j in 1:J) lambda[j,] = lambda[j,]/sum(lambda[j,]) print (cbind(lambda[,1],mu[,1],tau[,1], lambda[,2],mu[,2],tau[,2])) possible = c(0,25,50,100) for (i in 1:length(possible)){ y[8] = possible[i] for (m in 1:M){ lambda[,m] = lambda0[m]* dnorm(y,mu0[m],sqrt(sigma^2+tau0[m])) tau[,m] = sqrt(1/(1/tau0[m]^2 + 1/sigma^2)) mu[,m] = (mu0[m]/tau0[m]^2 + y/sigma^2)*tau[,m]^2} for (j in 1:J) lambda[j,] = lambda[j,]/sum(lambda[j,]) theta = seq(-40,120,.1) dens = lambda[8,1]*dnorm(theta,mu[8,1],tau[8,1]) + lambda[8,2]*dnorm(theta,mu[8,2],tau[8,2]) plot (theta, dens, ylim=c(0,1.1*max(dens)), type="l", xlab="theta=8", ylab="", xaxs="i", yaxs="i", yaxt="n", bty="n", cex=2) mtext (paste("Post density of theta=8 when", "y=8 =", possible[i]), cex=2, 3)}