440 likes | 578 Views
Introduction to R. A workshop hosted by STAT CLUB November 19, 2014. Outline. About R Getting started The very basics Importing data Common commands. I. About R. What is R?. Free, open-source software with its own language Works on all operating systems
E N D
Introduction to R A workshop hosted by STAT CLUB November 19, 2014
Outline • About R • Getting started • The very basics • Importing data • Common commands An Introduction to R
I. About R An Introduction to R
What is R? • Free, open-source software with its own language • Works on all operating systems • Extensive: LOTS of built-in functions and downloadable packages availble • Flexible: can define your own functions, modify existing commands, cutsomize graphics, etc. • Powerful: can do all sorts of analyses and can handle large data sets • Integrates in other environments such as Excel, LaTeX, Hadoop, etc. An Introduction to R
II. Getting started An Introduction to R
Installing R • Download from http://lib.stat.cmu.edu/R/CRAN/ • Select the correct platform • Download the “base” package • Run the set-up • It’s quick, easy, and free! An Introduction to R
Interacting with R • The command prompt ‘>’ indicates that we can begin typing a command • Hit Esc key to exit out of a line of code • Basic rule: type a command and hit ‘enter’ to execute it • Hit the “Stop” button at top to stop running a line of code • For example: x = 1:100 creates a vector of values 1,2,…,100 An Introduction to R
R Script/Editor • A files where you can write, edit, and save codes • Go to File > New script • When you have typed in the code you want to run, highlight the chunk you want to run and either hit ‘ctrl+R’ or right-click and select “Run line or selection” • You can save this script for later use by hit ‘ctrl+S’ or going to File > Save while the Script window is activated • Will NOT save any results of running the commands – saves the text script only An Introduction to R
R Script/Editor An Introduction to R
Workspaces • An R workspace includes all the functions and variables (called “objects”) defined in a session • The output associated with any command you’ve run will be stored in the workspace • Can be saved by going to File > Save Workspace • Load workspace by going to File > Load Workspace • If you want to clear some part of the workspace, use rm() • Use ls() to see what has been saved An Introduction to R
Working directory • Where everything will be saved/loaded from by default • It is, by default, usually in My Documents • Can change this under File > Change dir • If you want to save/load from a different place, can usually just type the file path into the name of the file and R will find it • Makes it easier to always work from your working directory An Introduction to R
R packages • Collections of R functions and datasets created by others • Many standard packages included, others have to be downloaded • If you know the name of the package, can install it by going to Package > Install Packages or by using install.packages(“packagename”) in command line • Even if a package is installed on your computer, R will not automatically use it – so if you need to use a function from a package, use library(“packagename”) in command line An Introduction to R
Use as a calculator • Basic arithmetic can be done intuitively: 12+5, 3/8, 17+(6*5), 5^2, Etc. • Don’t use brackets! They mean something else! Use parentheses An Introduction to R
R commands and language • Mostly in the form of functions: mean(x), plot(x,y), etc. • CaSeSenSitVe! • Spaces don’t usually mean anything • Can use periods ‘.’ and underscores ‘_’ in object names An Introduction to R
Getting help in R • Go to Help menu • If you know the exact command that you need help with, can type ‘?’ before command name in console, and this will bring up an online documentation for the commands • If you do not know the exact command but have an idea of what it might look like or what words may be used in the description, type ‘??’ before the command • Google! An Introduction to R
III. The very basics An Introduction to R
Basics to know first • Creating your own Objects (variables, vectors, matrices, lists, functions, etc.) • Assigning names to these objects • Learning to access objects • Performing simple calculations and transformations on these objects An Introduction to R
Types of objects Functions that you can perform on or with objects depends on their “class” or type: • Numeric (double-precision numbers) • Double (same as numeric) • Integer (integer-valued; rarely used) • Character (strings, non-numerical values) • Matrix (matrix of numerical values) • Logical (Boolean – true/false) • Factor (“groups” or levels) • List (list of other types of objects) • Dataframe (table or other collection of data that is numerical or non-numerical) • Functions (functions that take inputs) • To find out which class a variable belongs to, use class() • To determine the dimensions of an object use dim() • Verify a class by using is.numeric(), is.character(), is.logical(), is.data.frame, etc. • Change a class by using as.numeric(), as.character(), as.logical(), as.data.frame, etc. An Introduction to R
Single value • Use ‘=‘ or ‘<-’ to assign name to a value • Use quotations if not a numerical value • Example: x = 36 • Example: y <- “age” An Introduction to R
Vectors • Vector: c() • Use ‘=‘ or ‘<-’ to assign name to vector • If the vector contains non-numerical values, use quotations • Example: mileage = c(1200,200,6700,1000,1200) • Example: type <- c(“Compact”, “Minivan”, “SUV” , “Roadster” , “Truck”) An Introduction to R
Matrix • Matrix: matrix(data=c(2,3,4,5), nrow=2, ncol=2) • data = vector of values you want entered in (enters in by COLUMN!) • nrow = number of rows • ncol = number of columns An Introduction to R
Dataframe • Like a table • Can contain both numerical and string variables • Use data.frame(vars) An Introduction to R
Lists • Each element in a list can be ANY object – vector, matrix, dataframe, even another list! • Use list(vars) An Introduction to R
Functions • Creating functions are more complex • Of the form: g <-function(var1,var2) {var1 + var2} • g is the function name • Var1, var2 are the input variables • The function goes in the curly brackets • To use the function: g(input1, input2) An Introduction to R
IV. Importing data An Introduction to R
Importing data • Can import from many formats (.txt, .csv, .xls, .xlsx, .sav, .dta, .ssd, …) • Recommend .txt or .csv – others need packages • If in working directory: data1 = read.table(“mydata.txt”, header=TRUE, sep=“,”) • header = TRUE indicates that a row of column headings/titles are included in the file; set to FALSE if not • sep=“,” indicates that a comma is separating records, like in a .csv; can remove this code if separated by space or tab (default); or can modify if separated by something else • If not in working directory, use file path: data1 = read.table(“C:/Users/xyz/Desktop/folder/mydata.text”, header=TRUE, sep=“,”) An Introduction to R
Working with data sets • Attach datasets to the current space: attach(dataset) • Use a variable from a dataset: dataset$varname • Retrieve the names of the variables: names(dataset) • Take a subset of your data according to some criterion: subset(dataset,criterion) An Introduction to R
V. Common commands An Introduction to R
Arithmetic/calculator • Add: + • Subtract: - • Multiply: * • Divide: / • Raise to a power: ^ • Natural logarithm: log() • Exponentiation: exp() • Square root: sqrt() An Introduction to R
Vector commands • Create a vector of numbers: c(num1,num2) • Combine vectors together to create one: c(vec1,vec2) • Create a vector of numbers from a to b in increments of 1: a:b • Create a vector of numbers from a to be in increments of d: seq(a,b,d) • Create a vector of numbers from a to b in equal increments such that the there are k total numbers: seq(a,b,length=k) • Return the number of elements in a vector x: length(x) • Sort entries in a vector x: sort(x, decreasing=FALSE) • Element-wise arithmetic: 3*x, 4+x, log(x), sqrt(x), etc. • Arithmetic of two vectors x and y will be element-wise: x*y, x+y, etc. An Introduction to R
Matrix commands • Create a matrix: matrix(vals,nrow,ncol) • Create a diagonal matrix: diag(vals) • Multiply matrices M1 and M2: M1 %*% M2 • Note that M1*M2 will be element-wise • Find the determinant of matrix M: det(M) • Find inverse of matrix M: solve(M) • Find transpose of matrix M: t(M) • Combine matrices by column: cbind(M1,M2) • Combine matrices by row: rbind(M1,M2) • Find dimensions of a matrix M: dim(M) An Introduction to R
Retrieving parts of objects • Return the kth element of a vector x: x[k] • Return the i,jth element of a matrix x: x[i,j] • Return the kth object of a list x: x[[k]] • Return the ith element of the kth object of a list x: x[[k]][i] • Return the element or object called “name”: x$name • Can retrieve more than one element at a time An Introduction to R
Summaries and statistics • Mean: mean(x) • Standard deviation: sd(x) • Median: median(x) • Minimum: min(x) • Maximum: max(x) • Range(min and max): range(x) • Sum: sum(x) • Which index contains the minimum value: which.min • Which index contains the maximum value: which.max An Introduction to R
Logical • Operators: > greater than, >= greater than or equal to, < less than, <= less than or equal to, == equal to, != not equal to, & and, | or • Just entering some function of operators will return a Boolean(‘TRUE’ or ‘FALSE’) vector • R will many times treat TRUE as 1 and FALSE as 0 so that you can conduct mathematical operations on them • Return indices of a vector that satisfies criterion: which(x > 45) • To get the actual value: x[x>45] An Introduction to R
Logical • If-then statements: if (criterion) {command} else {command} • For-loops: for (i in x){ commands } An Introduction to R
Apply functions • Apply a function to rows or columns of a matrix: • apply(M, 1, mean) will take average across rows • apply(M, 2, sum) will sum columns • Apply a function to each element of a vector, list or data.frame: sapply(L, length) An Introduction to R
Plots • Scatterplot of a vector x and vector y: plot(x,y) • Add points to an already-existing scatterplot: points(xvals,yvals) • Add a line to an already-existing scatterplot: lines(xvals,yvals) • Histogram of a vector of values x: hist(x) An Introduction to R
Tables • Create a table of frequencies from a vector of values x: table(x) • Create a two-way table between vectors x and y of same length: table(x,y) An Introduction to R
Linear regression • Linear regression of y on x: lm(y~x) • Can get more info using summary(model) An Introduction to R
Working with datasets: linear regression • Can find all objects in the model: names(model) An Introduction to R
Hypothesis tests • One-sample t-test for vector of values x: t.test(x, alternative=“two.sided”,mu=0) • Two-sample t-test between vectors x and y: t.test(x,y) • Chi-squre test of independence in two-way table “tab”: chisq.test(tab) An Introduction to R
Final warnings! • Floating point arithmetic is not exact! • Missing values are not excluded by default – must use na.rm = TRUE option • Combining different classes will all entries to be the same class • Some things, such as quotation marks, cannot not be easily copied and pasted into R from other applications such as Word An Introduction to R