660 likes | 837 Views
Welcome to the R intro Workshop. Before we begin, please download the “SwissNotes.csv” and “cardiac.txt” files from the ISCC website, under the R workshop (more info). www.iub.edu/~iscc. Introduction to R Workshop in Methods from the Indiana Statistical Consulting Center. Thomas A. Jackson
E N D
Welcome to the R intro Workshop Before we begin, please download the “SwissNotes.csv” and “cardiac.txt” files from the ISCC website, under the R workshop (more info). www.iub.edu/~iscc
Introduction to RWorkshop in Methods from theIndiana Statistical Consulting Center Thomas A. Jackson February 15, 2013
Overview The R Project for Statistical Computing http://cran.r-project.org “R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and Colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.” - Description from CRAN Website
Benefits R … • is free • is interactive: we can type something in and work with it • How we analyze data can be broken into small steps • is interpretative: we give it commands and it translates them into mathematical procedures or data management steps • can be used in a batch: nice because it is documented • is a calculator: it is unlike other calculators though because you can create variables and objects
Let’s Get R Started • How to open R → Start Menu → Programs → Departmentally Supported → Stat/Math → R
Graphical User Interface (GUI) Three Environments • Command Window (aka Console) • Script Window • Plot Window
Command Window Basics To quit: type q() Save workspace image? Moves from memory to hard-drive Storing variable in memory • <- , -> , or = • a<- 5 stores the number 5 in the object “a” • pi -> b stores the number π= 3.141593 in “b” • x= 1 + 2 stores the result of the calculation (3) in “x” • “=“ requires left-hand assignment Try not to overwrite reserved names such as t, c, and pi!
Command Window Basics Printing to output • Calculations that are not stored print to output > 3 + 5 [1] 8 • Type name to view stored object > a [1] 5 • Use print() > print(a) [1] 5 View objects in workspace • objects() or ls()
Command Window Basics Clearing the console (command window) • Mac: Edit → Clear Console • Windows: Edit → Clear Console or • Mac: Alt + Command + L • Windows: Ctrl + L Removing variables from memory • rm() or remove() > x <- 4 > rm(x) • rm(list = ls()) remove all variables
Script Window Basics Saving syntax (code) • Mac: File → New • Windows: File → New Script Documenting code: # Comments out everything on line behind Running code from Script Window • Mac: Apple + Enter • Windows: F5 or Ctrl + r
Working Directory Obtaining working directory • getwd() • Mac: Misc → Get Working Directory • Windows: File → Change dir... Changing working directory • setwd() • Mac: Misc → Change Working Directory • Windows: File → Change dir...
Path Names Specify with forward slashes or double backslashes Enclose in single or double quotation marks Examples • setwd(“C:/Program Files/R/R-2.6.1”) • setwd(‘C:\\Program Files\\R\\R-2.6.1’)
R Help Helpful commands • If you know the function name: help() or ? > help(log) > ?exp • If you do not know the function name: help.search() or ?? > help.search(“anova”) > ??regression
Documentation Elements of a documentation file • Function{Package} • Description • Usage: What your code should look like, “=“ gives default • Arguments: Inputs to the function • Details • Value: What the function will return • See Also: Related functions • Examples
Online Resources • CRAN Website: http://cran.r-project.org/ • R Seek: http://www.rseek.org/ • Quick-R tutorial: http://www.statmethods.net/ • R Tutor: http://www.r-tutor.com/ • UCLA: http://www.ats.ucla.edu/stat/r/ • R listservs • Google Google tip: include “[R]” (instead of just “R”) with search topic to help filter out non-R websites
Additional Packages Over 2,500 listed on the CRAN website! • Use with caution • Initial download of R: base, graphics, stats, utils 1) Installing a package: • Mac: Packages & Data → Package Installer Use Package Search to locate and press ‘Install Selected’ • Windows: Packages → Install Packages Locate desired package and press ‘OK’ • install.packages(“MASS”) 2) Using an installed package: You MUST call it into active memory with library() > library(MASS)
Data Structures R has several basic types (or “classes”) of data: • Numeric - Numbers • Character – Strings (letters, words, etc.) • Logical – TRUE or FALSE • Vector • Matrix • Array • Data Frame • List NOTE: There are other classes, but these are most common. Understanding differences will save you some headache.
Data Structures • Find class of data • Unknown class: class() • Check particular class: is.“classname”() > a <- 5 > class(a) [1] “numeric” > is.character(a) [1] FALSE Change class: as.classname() > as.character(a) [1] “5”
Vectors Combine items into vector: c() > c(1,2,3,4,5,6) [1] 1 2 3 4 5 6 Repeat number of sequence of numbers: rep() > rep(1,5) [1] 1 1 1 1 1 > rep (c(2,5,7), times = 3) [1] 2 5 7 2 5 7 2 5 7
Vectors Sequence generation: seq() > seq(1,5) [1] 1 2 3 4 5 > seq(1,5, by = .5) [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Try 1:10 or 10:1
Matrices Create matrix: matrix() • 6 x 1 matrix: matrix(1:6, ncol = 1) • 2 x 3 matrix: matrix(1:6, nrow =2, ncol =3) • 2 x 3 matrix filling across rows first: matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE) Create matrix of more than two dimensions (array): array()
Lists Create a list: list() • Holds vectors, matrices, arrays, etc. of varying lengths • Objects in the list can be named or unnamed > list(matrix(0, 2, 2), y = rep(c(“A”, “B”), each = 2)) [[1]] [,1] [,2] [1,] 0 0 [2,] 0 0 $y [1] “A” “A” “B” “B” Data Frame: specialized list that holds variables of same length
Data Frames Create a data frame: data.frame() • Like a matrix, holds specified number of rows and columns > x <- 1:4 > y <- rep(c(“A”, ”B”), each = 2) > data.frame(x,y) x y 1 1 A 2 2 A 3 3 B 4 4 B • Unnamed variables get assigned names > data.frame(1:2, c(“A”, “B”)) X1.2 c..A….B.. 1 1 A 2 2 B
Basic Operations • Arithmetic: +, -, *, / • Order of operations: () • Exponentiaition: ^, exp() • Other: log(), sqrt • Evaluate standard Normal density curve, at x = 3 > x <- 3 > 1/sqrt(2*pi)*exp(-(x^2)/2) [1] 0.004431848
Vectorization R is great at vectorizing operations • Feed a matrix or vector into an expression • Receive an object of similar dimension as output For example, evaluate at x = 0,1,2,3 > x <- c(0,1,2,3) > 1/sqrt(2*pi)*exp(-(x^2)/2) [1] 0.39842280 0.241970725 0.053990967 0.004431848
Logical Operations • Compare: ==, >, <, >=, <=, != > a <- c(1,1,2,4,3,1) > a == 2 [1] FALSE FALSE TRUE FALSE FALSEFALSE • And: & or && • Or: | or || • Find location of TRUEs: which() > which(a == 1) [1] 1 2 6
Subsetting > a <- 1:5 > b <- matrix(1:12,nrow = 3) Use Square brackets [] • Pick range of elements: a[1:3] • Pick particular elements: a[c(1,3,5)] • Do not include elements: a[-c(1,4)]
Subsetting (cont.) Use commas in more than on dimension (matrices & data frames) • Pick particular elements: B[1:2,2:4] • Give all rows and specified columns: B[,1:2] • Give all columns and specified rows: B[1:2,] • Note: B[2] coerces into a vector then gives specified element
Reading External Data Files SwissNotes.csv Data set • Complied by Bernard Flury • Contains measurements on 200 Swiss Bank Notes • 100 genuine and 100 counterfeit notes
Reading External Data Files (cont.) Most general function: read.table() read.table(file,header=FALSE,sep = “”,…) • Creates a data frame • File name must be in quotes, single or double • File name is case sensitive • Include file name extension if data not in working directory > read.table(“C:/Users/jacksota/Desktop/SwissNotes.csv”,T,“,”) Don’t know the file extension? Try: file.choose() > read.table(file.choose(), header = TRUE, sep = ”,”) • sep defines the separator, e.g. “,” or “\t” or “” • header indicates variable names should be read from first row
Reading External Data Files For comma delimited files: read.csv() For tab delimited files: read.delim() For Minitab, SPSS, SAS, STATA, etc. data: foreign package • Contains functions to read variety of file formats • Functions operate like read.data() • Contains functions for writing data into these file formats
Data Frame Hints • Identify variable names in data frame: names() > data1 <- read.table(“SwissNotes.csv”, sep=“,”, header =TRUE) > names(data1) [1] “Length” “LeftHeight” “RightHeight” “LowerInner.Frame” [5] “UpperInner.Frame” “Diagonal” “Type” Assign name to data frame variables > names(data1) <- c(“Length”, “LeftHeight”, “RightHeight”, “LowerInner..Frame”, “UpperInner.Frame”, “Diagonal”, “Type”) Note: names are strings and MUST be contained in quotes
Data Frame Hints (cont.) Create objects out of each data frame variable: attach() In the Swiss Note data, to refer to Type as its own object > attach(data1) > Type [1] Genuine GenuineGenuine ….
Data Frame Hints (cont.) Remove attached objects from workspace: detach() > detach(data1) > Type Error: object “Type” not found Note: Type is still part of original data frame, but is no longer a separate object.
plot() function plot() is the primary plotting function Calling plot will open a new plotting window Documentation: ?plot For complete list of graphical parameters to manipulate: ?par
plot() function Let’s visualize the SwissNotes.csv data. After loading the data into R, attach the data frame using attach(data). Let’s try a scatter plot of LeftHeight by RightHeight. >plot(LeftHeight, RightHeight)
plot() function Change symbols: Option pch=. See ?par for details. >plot(LeftHeight,RightHeight,pch=2)
plot() Function Change symbol color: Option col= Specify by number or by name: col=2 or col=“red” Hint: Type palette() to see colors associated with number Type colors() to see all possible colors > plot(LeftHeight, RightHeight, col=“red”)
plot() Function Change plot type: Option type = “p” for points “l” for lines “b” for both “c” for lines part alone of “b” “o” for both overplotted “h” for histogram like (or high-density) vertical lines “s” for stair steps “S” for other steps, see Details below “n” for no plotting
Plot() Function Points with lines…works better on sorted list of points >plot(LeftHeight,RightHeight,type=“o”)
Scatterplots for Multiple Groups Use plot() with points() to plot different groups in same plot Genuine notes vs. Counterfeit notes >plot(LeftHeight[Type==“Genuine”],Rightheight[Type==“Genuine”], col=“red”) >points(LeftHeight[Type==“Counterfeit”],RightHeight[Type==“Counterfeit”] ,col=“blue”)
Axis Labels and Plot Titles The plot() command call has options to • Specify x-axis label: xlab = “X Label” • Specify y-axis label: ylab = “Y Label” • Specify plot title: main = “Main Title” • Specify subtitle: sub = “Subtitle”
Axis Labels and Plot Titles >plot(LeftHeight[Type==”Genuine”],RightHeight[Type==“Genuine”], col=“red”,main=“Plot of Bank Note Heights”,sub=“Measurements are in mm”,xlab=“Height of Left Side”,ylab=“Height of Right Side”) >points(LeftHeight[Type==“Counterfeit”], RightHeight[Type=“Counterfeit”],col=“blue”)
Legends • legend(“topleft”,c(“Genuine Notes”, ”Counterfeit Notes”),pch=c(21,21),col=c(“red”,”blue”))
Adding Lines To add straight lines to plot: abline() abline() refers to standard equation for a line: y = bx + a • Horizontal line: abline(h= ) • Vertical Line: abline(v= ) • Otherwise: abline(a= , b= ) or abline(coef=c(a,b))
Adding Lines > abline(coef=c(21.7104,0.8319))
Histograms Histograms are another popular plotting option. > hist(Length)
pairs() Function Using the SwissNote Data > pairs(swiss)