840 likes | 1.27k Views
Bare-Bones R. A Brief Introductory Guide Thomas P. Hogan University of Scranton 2010 All Rights Reserved. Citation and Usage. This set of PowerPoint slides is keyed to Bare-Bones R: A Brief Introductory Guide, by Thomas P. Hogan, SAGE Publications, 2010.
E N D
Bare-Bones R A Brief Introductory Guide Thomas P. Hogan University of Scranton 2010 All Rights Reserved
Citation and Usage This set of PowerPoint slides is keyed to Bare-Bones R: A Brief Introductory Guide, by Thomas P. Hogan, SAGE Publications, 2010. All are welcome to use and/or adapt the slides without seeking further permission but with the usual professional acknowledgment of source.
Part 1: Base R • 1-1 What is R • A computer language, with orientation toward statistical applications • Relatively new • Growing rapidly in use
1-2 R’s Ups and Downs • Plusses • Completely free, just download from Internet • Many add-on packages for specialized uses • Open source • Minuses • Obscure terms, intimidating manuals, odd symbols, inelegant output (except graphics)
1-3 Getting Started: Loading R • Have Internet connection • Go to http://cran.r-project/ • R for Windows screen, click “base” • Find, click on download R • Click Run, OK, or Next for all screens • End up with R icon on desktop
Downloading Base R [Figs 1.1 – 1.4] • Click on Windows • Then in next screen, click on “base” • Then screens for Run, OK, or Next • And finally “Finish” • will put R icon on desktop
What You Should Have when clicking on R icon:Rgui and R Consoleending with R prompt (>) [Fig 1.5]
The R prompt (>) • > This is the “R prompt.” It says R is ready to take your command.
1-4 Using R as Calculator • Enter these after the prompt, observe output >2+3 >2^3+(5) >6/2+(8+5) >2 ^ 3 + (5)
More as Calculator • You can copy and paste, but don’t include the > • Use # at end of command for notes, e.g. > (22+ 34+ 18+ 29+ 36)/5 #Calculating the average, aka mean • R as calculator: Not very useful
1-5 Creating a Data Set • > Scores = c(22, 34, 18, 29, 36) c means “concatenate” in R – in plain English “treat as data set” • Now do: >Scores R will print the data set
Important Rules • We created a variable • Variable names are case sensitive • No blanks in name (can use _ or . to join words, but not -) • Start with a letter (cap or lc) • Can use <- instead of =
Another variable • Create SCORES, using <- > SCORES<-c(122, 134, 118, 129, 124) • NB: SCORES different than Scores Check with >SCORES >Scores
Non-numeric Data • Enclose in quotes, single or double • Separate entries with comma • Example: > names = c(“Mary”, “Tom”, “Ed”, “Dan”, “Meg”)
Saving Stuff • To exit: either X or quit ( ) • Brings up this screen: • Do what you want: Yes or No • Do Yes, • then re-open R, get Scores & names
Special Note on Saving • Previous slide assumes you control computer • If not, use File, Save Workspace, name file, click Save • Works much like saving a file in Microsoft • To retrieve, do File, Load Workspace, find file, click Open
1-6 Using R Functions: Simple Stuff • Commands for mean, sd, summary (NB: function names case sensitive) • mean(Scores) • sd(Scores) • summary(Scores) • Command for correlation • cor(Scores,SCORES)
R functions • A zillion of ‘em • R’s big strength, most common use • For examples: • Help • R functions(text) • Enter name of a function (e.g., sd) • Yields lots (!) of information
1-7 Reading in Larger Data Sets • In Excel, enter (or download) the SATGPA20 file • Save as .xls • Then save as Text (tab delimited) file • Will have .txt extension
… Larger Data SetsThe read.table command • Now read into R like this: >SATGPA20R=read.table("E:/R/SATGPA20.txt", header =T) • Need exact path, in quotes • header = T • T or TRUE, F or FALSE • Depends on opening line of file
The file.choose ( ) command • At > enter file.choose ( ) • Accesses your system’s files, much like Open in Microsoft • Find the file, click on it • R prints the exact path in R Console • Can copy and paste into read.table
Checking what you’ve got: • Enter >SATGPA20R • Then >mean (SATGPA20R) • Try >mean (GPA)
The attach Command • To access individual variables, do this: >attach(SATGPA20R) • Now try: >mean(GPA)
The data.frame Command • Let’s create these 3 variables with c > IQ = c(110, 95, 140, 89, 102) > CS = c(59, 40, 62, 40, 55) > WQ = c(2, 4, 5, 1, 3) • Then put them together with: >All_Data = data.frame(IQ, CS, WQ) • Check with: >mean(All_Data)
1-8 Getting Help • >help(sd) • >example(sd) • On R Console: Help R functions (text) Enter function name, click OK Reminder: function names case sensitive
R’s “function” terms R language: function(arguments) Plain English: Do this (to this) or Do this (to this, with these conditions)
1-9 Dealing with Missing Data • NB: It’s a pain in R! • Key items • In data, enter NA for a missing value • In (most) commands, use na.rm=T
Examples for missing data >Data=c(2,4,6,NA,10) >mean(Data, na.rm=T) • Add to the SATGPA20 file 21 1 NA NA NA 3.14 23 2 1 NA NA 2.86 Etc. and create new file SATGPA25R • Then >mean(SATGPA25R, na.rm=T) • Note exception for cor function (use=‘complete’)
1-10 Using R Functions: Hypothesis tests • Be sure you have an active data set (SATGPA25R), using attach if needed • Then, to test male vs. female on SATM: >t.test(SATM~SEX) # note tilde~ • Examples of changing defaults: >t.test(SATM~SEX, var.equal=TRUE, conf.level=0.99)
Hypothesis tests: Chi-square • Using SEX and State variables in SATGPA25R • chisq.test (SEX, State)
1-11 R Functions for Commonly Used Statistics functioncalculates this mean ( ) mean median ( ) median mode ( ) mode sd ( ) standard deviation range ( ) range IQR ( ) interquartile range min ( ) minimum value max ( ) maximum value cor ( ) correlation quantile ( ) percentile t.test ( ) t-test chisq.test ( ) chi-sqaure NB1: See notes in text for details NB2: R contains many more functions
1-12 Two Commands for Managing Your Files > ls ( ) Will list your currently saved files > rm (file) Insert file name; this will remove the file NB: R has many such commands
1-13 R Graphics • R graphs: good, simple • Let’s start with hist and boxplot with the SATGPA25R file >hist(SATM) >boxplot(SATM) >boxplot(SATV, SATM) • R Graphics window opens, need to minimize to get R Console
More Graphics: plot • Create these variables >RS=c(12,14,16,18,25) >MS=c(10,8,16,12,20) • Now do this: >plot(RS, MS)
Line of Best Fit • Do these for the RS and MS variables: > lm(MS~RS) # lm means linear model > res=lm(MS~RS) # res means residuals > abline(res) # read as ‘a-b’ line
Controlling Your Graphics: A Brief Look • R has many (often obscure) ways for controlling graphics; we’ll look at a few • Basically, we’ll change “defaults” Examples (try each one): • Limits (ranges) for X and Y axes >plot(RS, MS, xlim = c(5,25), ylim = c(5,25))
Controlling Graphs: More Examples • Plot characters: >plot(RS, MS, pch=3) • Line widths >plot(RS, MS, pch=3, lwd=5) • Axis labels >plot(RS, MS, xlab = “Reading Score”, ylab = “Math Score”) • You can put them all together in one command
Part 2: R Commander • 2-1 What is R Commander? • Point and click version of R • Uses (and prints) base R commands • Loading: Easy – it’s just a package • See next slide
Loading Rcmdr • On R Gui, top menu bar click Packages, then Install package(s). Pick a CRAN mirror site (nearby), click OK. From the list of packages ,scroll to Rcmdr, highlight it, click OK • After it loads, do these: • Check with: >library ( ) • Activate with: >library (Rcmdr)
Rcmdr’s extra packages • Scary message when first activating Rcmdr: • Just click Yes – and take a break
The R Commander Window • You get, R Commander window with • Script window • Output window (incl Submit button) • Message window
2-2 R Commander Windows and Menus • File • Edit • Data ** • Statistics ** Most important for us • Graphs ** • Models • Distributions • Tools • Help
Our Lesser Used Menus • File [Table 2.1] • Much like in Microsoft • Manage files • Edit [Table 2.2] • Much like in Microsoft • Can do with right click of mouse
Our Lesser Used Menus (cont) • Models Mostly more advanced stats • Distributions • Tools • Load packages • Options – change output defaults • Help • Searchable index • R Commander manual
2-3 The Data Menu (very important)(Submenus for creating/getting data sets) • New data set – create new data set • Load data set – only for existing .rda data • Import data – import from various file types • Data in packages – not important for us
Data Menu (cont.) (Submenus for managing data sets) • Active data set • Do stuff with current data set • Manage variables in active data set • Do stuff with variables in current data set
New data set [Fig. 2.3] • Click on it, brings up spreadsheet • Name it SampleData
New data set (cont) • Enter these data: var1 var2 var3 2 1 5 5 4 7 3 7 8 6 8 9 9 2 9 • Then kill window with X • Note: SampleData in Active Data Set
Now Try These • View active data set • Edit active data set • In Script window, type* • mean(SampleData) • sd (SampleData) • mean(var1) [gives error message] • Attach(SampleData) • mean(var1) * When typing do not include >, do hit Submit