280 likes | 767 Views
Introduction to R. A. Di Bucchianico. Types of statistical software. command-line software requires knowledge of syntax of commands reproducible results through scripts detailed analyses possible GUI-based software does not require knowledge of commands not reproducible actions
E N D
Introduction to R A. Di Bucchianico
Types of statistical software • command-line software • requires knowledge of syntax of commands • reproducible results through scripts • detailed analyses possible • GUI-based software • does not require knowledge of commands • not reproducible actions • hybrid types (both command-line and GUI) Introduction to R
Well-known statistical software • SAS • SPSS • Minitab • Statgraphics • S-Plus • R • … Introduction to R
R • free • language almost the same as S • maintained by top quality experts • available on all platforms • continuous improvement Available through www.r-project.org Introduction to R
Contents • Basic operations • Data creation + I/O • Component extraction • Plots • Basic statistics • Libraries • Regression analysis • Survival analysis Introduction to R
Basic operations • assignment operation: a <- 2+sqrt(5) • help function: • help(pnorm) • help.search(“normal distribution”) • probability functions: • d (density): dgamma(x,n,) • p (probability=cdf): pweibull(x,3,2) • q (quantile): qnorm(0.95) • r (random numbers): rexp(10,) Introduction to R
Data creation + I/O • create • vectors: c(1,2,3) • matrices: matrix(c(1,2,3,4,5,6),2,3,byrow=T) (2=#rows) • list • patterns: • “:” (1,2,3) = 1:3 • seq (1,2,3) = seq(1,3,by=1) • working directories and files: • setwd • getwd • attach • read data • from file: read.table(“file.txt”,header=TRUE) • from web: read.data.url Introduction to R
Component extraction • d[r,]: rth row of object d • d[,c]: cth column of object d • d[r,c]: entry in row r and column c of object d • length(d): length of d • d[d<20]: extract all elements of d that are smaller than 20 • d[“age”]: extract column “age” from object d Introduction to R
Plots • plot: both 1D and 2D plots • hist: histogram • qqnorm: normal probability plot (“quantile-quantile” plot) Save graphics by choosing File -> Save as Introduction to R
Basic statistics • summary • mean • stdev • t.test • boxplot Introduction to R
Packages • specialized functions available through packages and libraries • in Windows interface choose Packages -> Load Packages • examples of packages: • qcc (quality control) • survival Introduction to R
Functions Analyses that have to be performed often can be put in the form of functions Example: simple <- function(data,mean=0,alpha=0.05) {hist(data),t.test(data,conf.level=alpha,mu=mean,alternative=“two-sided”)} simple(data,4) uses the default value 0.05 and test the null hypothesis mu=4. Introduction to R
Regression analysis • general command: lm (linear model) • requires data to be available in the form of a data frame • more general than matrix because columns need not have same length) • use command data.frame for conversion • other types of regression also possible (see also dedicated packages) Introduction to R
Survival analysis • through library Surv of survival • Cox proportional hazards: coxph Introduction to R
Useful web sites • www.r-project.org • http://cran.r-project.org/doc/contrib/Short-refcard.pdf • http://www.uni-muenster.de/ZIV/Mitarbeiter/BennoSueselbeck/s-html/shelp.html • http://www.maths.lth.se/help/R/ • http://www.mas.ncl.ac.uk/~ndjw1/teaching/sim/R-intro.html • http://stats.math.uni-augsburg.de/JGR/ • http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/index.html Introduction to R