290 likes | 484 Views
R. Statistical package 4 th generation programming language extensible through functions and extensions environment for statistical computing and graphics statistical and graphical techniques e xtensible through packages we will learn to work with both these tools the line editor
E N D
R Statistical package 4th generation programming language extensible through functions and extensions environment for statistical computing and graphics statistical and graphical techniques extensible through packages we will learn to work with both these tools the line editor the graphical interface R commander Competitors: SPSS, Matlab 1
To install R commander Packages Install Package(s)... CRAN Mirror Rcmdr wait for installation of Rcmdr and additional packages To load R commander Packages Load Package... Rcmdr to warning on missing packages answer Yes answer to download them from CRAN Installing R commander 2
Whenever you want to run it Packages Load Package... Rcmdr File Change Working directory R commander has problems navigating through your directories’ tree Choose an easy-to-find directory, such as your Desktop or the place where you keep your R exercises. Commands getwd() setwd("path with double backslashes") help(command) or ?command q() Running R commander 3
R commander windows script, contains the written instructions R commander File Save Script as… output, contains the output R commander File Save Output as… pasting them into a text file Workspace contains the data structure File Save Workspace… R commander File Save R workspace As… save.image("path with double backslashes") File Load Workspace… load("path with double backslashes") Files to save 4
Variable and vector numeric, logical, character Nominal variables and vectors: factor Ordinal variables and vectors: ordered Dataset: data.frame Time series: time.series ls(), ls.str() object, print(object), str(object), rm(object) Objects 5
Variable assignment variable <- value or formula, value or formula -> variable + - * / ** == != < > <= >= & | ! Vector vector <- c(list of values or other vectors) using c() or paste() to concatenate values vector of consecutive numbers: vector<-start:end vector[index] to access a single vector’s element index, sequence, negative sequence, condition variable and vector 6
NA: means "not available", whenever datum is missing NAN: means "not a number", whenever calculation cannot be done for this vector’s element or dataset’s case Inf: means "infinity", result of /0 or log 0 ! is.na(variable) returns TRUE for NA and NAN is.na(vector) returns a logical vector it can be used to remove missing values from a vector as: vecWithoutMissing<- vecWithMissing[ !is.na(vecWithMissing) ] Missing values 7
for loop to repeat some statements for (jin start:end) {statements separated by semicolon} jin this case is an integer variable for loop to scan vector’s elements for(variablein vector) {statements separated by semicolon which use variable} for(j in start:end) {statements separated by semicolon which vector[j]} vector's length is length(vector) for loop 8
function's name <- function(arguments) {statements separated by semicolon;return(object) } square <- function(x) {y<-x*x; return(y)} interest <- function(c,i=0.05,t=1){y<-c*(1+i)**t; return(y)} Usage: function name(arguments) square(5) returns 25 interest(100) returns 105 interest(100,0.1,2) returns 121 function 9
if(condition){statements separated by semicolon} else {statements} curled parentheses are optional when statement is only one else is optional if(a+3<=5){b<-7;c<-9} else b<-2 if(a==2 & b<c) print(b) it is typically used inside functions or for loops if control 10
It builds nominal/ordinal variables R commander Data Manage variables in active data set Convert numeric variables to factor factor(vector , labels=array of labels) newfact <- factor(vector) ordered(vector) newfact <- ordered(vect) newfact<- ordered(vect, labels=c('s','m','l','xl')) levels(factor) factor and ordered factor 11
database table suited for statistical analysis unfortunately its vectors are called variables case names are optional data.frame or dataset 12
vectors are accessed via dataset$vector or attach(dataset) and then use directly vector print(dataset) library(relimp) thenshowData(dataset) fix(dataset) does not work if you have dates in your dataset data.frame 13
dataset <- data.frame(vecnew=vector, …, row.names=col) vecnewis the new name that vector will have in the dataset col is the column number or vector’s name containing cases’ names character vectors are automatically converted to factors Creating a data.frame from vectors 14
R commander Data Data in packages data() help(dataset) data(dataset, package="package") Importing data.frame from packages 15
R commander Data Import Data from text file, clipboard or URL… dataset <- read.table("file path or URL", header=TRUE|FALSE, sep="separator", col.names=headers vector, na.strings="value for NA", dec=", or .") Importing data.frame from text files 16
written here just in case you'll ever need it; better and easier converting to text file! R commander Data Import Data from SPSS data set… value labels or factors library(foreign) dataset<- read.spss("file path or URL", use.value.labels = TRUE or FALSE, to.data.frame = TRUE) date importing is wrong! Fix it with library(chron) var<- as.chron(ISOdate(1582, 10, 14) + var) from Excel, Access or dBase data set… library(gdata) (probably package gdataneeds to be installed) dataset<- read.xls("file path or URL", sheet=sheet number, na.strings="value for NA") Importing data.frame from databases 17
R commander Data Active data set Export active data set… write.table(dataset, "file path", sep="separator", col.names=TRUE or FALSE, row.names=TRUE or FALSE, quote=FALSE, na="value for NA") Exporting data.frame to text file 18
database table with a time unit attached, suited for econometrics analysis time series <- ts(d, start=s, end=e, frequency=f) d is a data.frame or vector or matrix non numeric values are converted s is the time of the first datum; a number or a two elements vector to indicate unit-subunit e is the time of the last datum; same as s f is the number of observations per time unit mytimeseries<- ts(c(0,3,1,1,8,0,3,2,2,2), frequency = 4, start = c(1959, 2)) Data from 2nd quarter of 1959 to 3rd quarter of 1962 mytimeseries <- ts(c(0,1,3,8,1,0,3,2,2,2), frequency = 7, start = c(12, 3)) Data from 3rd day of week 12 to 5th day of week 13 plot.ts(time series) time.series 19
R commander Data Active Data Set Subset active data set… newdataset<- subset(dataset, condition) Usedtorestrictdatasettosomecases Remove cases with missing data… newdataset<- na.omit(dataset) Modify data set 20
Used to create or modify factor/ordered vectors R commander Data Manage variables in active data set Recode variables… newfactor<-Recode(vector or factor, 'changes separated by semicolon', as.factor.result=TRUE) "Bolzano"="here" c("Munich","Hannover",“Bonn") = "Germany“ Do not use "Munich","Hannover",“Bonn" = "Germany” as suggest by help else="Others" For numerical vectors we may use also 8:27= "high" together with lo and hi Massive recoding Recode 21
Used to create new vector through math operations R commander Data Manage variables in active data set Compute new variable… newvector<-with(dataset, formula) CO2$myname <- with(CO2, uptake*7-sqrt(conc) ) it is identical to CO2$myname <- CO2$uptake*7-sqrt(CO2$conc) Compute 22
Used to change a vector's values based on a condition No R commandermenu newvector<-replace(vector, condition, value) to set to a fixed value use grade2 <- replace(grade, grade < 18, 18) to set to variable values taken from a vector use grade2 <- replace(grade , attended==1 , grade[attended==1]+2) Warning: if you use a vector in the value, you must repeat the condition! Replace 23
Used to group scale vectors into ordered (but it produces factor) R commander Data Manage variables in active data set Bin numeric variable… newfactor<- bin.var(vector, bins=number of bins, method=binning method, labels=see below) method=‘intervals’ means same length intervals method=‘proportions’ means same count intervals method=‘natural’ means using K-means clustering labels=FALSE means using consecutive numbers labels=NULL means using ranges such as (27.2;35.8] labels=vector uses vector’s elements as labels varbinned<- bin.var(myvar, bins=6, method='proportions', labels=c(‘XS',‘S',‘M',‘L',‘XL',‘XXL')) Binning 24
R commander Graphs Color palette… Bar Bar graph… barplot(table(factor), xlab="x label", ylab="y label") Pie Pie chart… pie(table(factor), labels=levels(factor), main="title", col=rainbow_hcl(length(levels(factor))) ) option col=c(vector of palette) to change the colors Graphs for one nominal variable 25
R commander Graphs Plot all values case by case Index plot… plot(vector, type="h or p", col="color") Histogram Histogram… Hist(vector, breaks=number of intervals, col="color") Boxplot Boxplot… Boxplot( ~ vector, id.method="y or none" , col="color") Graphs for one scale variable 26
R commander Graphs Boxplot (scale variable versus nominal variable) Boxplot… Plot by groups… Boxplot(vector ~ factor, id.method="y or none") Scatterplot (two scale variables) Scatterplot… scatterplot(vector1~vector2, reg.line=FALSE or lm, smooth=FALSE, spread=FALSE, boxplots=FALSE, log="nothing or x or y or xy", grid=TRUE) Mathematical graph (two scale variable, first in order) Line graph… matplot(vector1, dataset[, c("vector2")], type=“l", ylab="vertical label", pch=1) matplot(vector1, vector2, type=“l", ylab="vertical label", pch=1) Graphs for two variables 27
R commander Graphs Scatterplot (two scale variables and one nominal) Scatterplot… scatterplot(vector1~vector2 | factor, reg.line=FALSE or lm, smooth=FALSE, spread=FALSE, boxplots=FALSE, log="nothing or x or y or xy", grid=TRUE) How to export your graphics into Word right-click copy as bitmap Graphs for three variables 28