1 / 20

Introduction to R and RStudio

Introduction to R and RStudio. Jeff Witmer. 9 March 2016. R is. A software package for statistical computing and graphics. A collection of 6,700 packages (as of June 2015, so more now). A (not ideal) programming language. A work environment. Widely used. Powerful. Free. Some history.

cbeauregard
Download Presentation

Introduction to R and RStudio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to R and RStudio Jeff Witmer 9 March 2016

  2. R is A software package for statistical computing and graphics A collection of 6,700 packages (as of June 2015, so more now) A (not ideal) programming language A work environment Widely used Powerful Free

  3. Some history S was developed at Bell Labs, starting in the 1970s R was created in the 1990s by Ross Ihaka and Robert Gentleman R was based on S, with code written in C S largely was used to make good graphs – not an easy thing in 1975. R, like S, is quite good for graphing. For lots of examples, see http://rgraphgallery.blogspot.com/ or http://www.r-graph-gallery.com/ See ggplot2-cheatsheet-2.0.pdf (Or for more detail, see http://docs.ggplot2.org/current/

  4. A few simple graphs using the ggplot2 package

  5. An example of graphing using the GGally package in R

  6. Who uses R?

  7. RStudio is An Integrated Development Environment (IDE) for R A gift, from J.J. Allaire (Macalester College, ‘91) to the world An easy (easier) way to use R Available as a desktop product or, as used at OC, run off of a file server. Free – unless you want the newest version, with more bells and whistles, and you are not eligible for the educational discount (= free) R supports rpubs – see http://rpubs.com/jawitmer

  8. RStudio screen shot

  9. R is object-oriented e.g., MyModel <- lm(wt ~ ht, data = mydata) then hist(MyModel$residuals) Note: lm(wt ~ ht*age + log(bp), data = mydata) regresses wt on ht, age, the ht-by-age interaction, and log(bp). There is no need to create the interaction or the lob(bp) variable outside of the lm() command. Comparing nested models: mod1 <- lm(wt ~ ht*age + log(bp), data = mydata) mod2 <- lm(wt ~ ht + log(bp), data = mydata) anova(mod2, mod1) gives a nested F-test

  10. R as a programming language If you want R to be (relatively) fast, take advantage of vector operations; e.g., use the replicate command (rather than a loop) or the tapply function. E.g., replicate(k=25,addingLines(n=10)) calls the addingLines function (something I wrote) 25 times. > with(Dabbs, tapply(testosterone, occupation, mean)) Actor MD Minister Prof 12.7 11.6 8.4 10.6

  11. If you want to know how to do something in R See the “Minimal R.pdf” handout Go to the Quick-R.com page (http://www.statmethods.net/) Google “How do I do xxx in R?” A standing joke among R users is that the answer is always “There are many ways to do that in R.” See http://swirlstats.com/ See https://www.datacamp.com/home

  12. Speaking of many ways to do something in R… (1) mean(mydata$ht) (2) with(mydata, mean(ht)) (3) mean(ht, data=mydata) However (1) plot(mydata$ht,mydata$wt) works (2) with(mydata, plot(ht,wt)) works (3) plot(ht, wt, data=mydata) does not work (3a) plot(wt~ht, data=mydata) works

  13. The mosaic package (Kaplan, Pruim, Horton) was created to make R easy to use for intro stats. mosaic package syntax: goal(y ~ x|z, data=mydata) E.g.: tally(~sex, data=HELPrct) E.g.: test(age ~ sex, data=HELPrct) E.g.: t.test(age ~ sex, data=HELPrct)$p.value E.g.: favstats(age ~ substance|sex, data=HELPrct) See MinimalR-2pages.pdf

  14. The mosaic package mPlot() command makes graphing easy. mPlot(SaratogaHouses)

  15. The openintro package edaPlot() command makes exploring data graphically easy to do. edaPlot(SaratogaHouses)

  16. The mosaic tidyr and dplyr packages handle SQL-ytpe work: merging files, extracting subsets, etc. data(NCHS) #loads in the NCHS data frame newNCHS <- NCHS %>% sample_n(size=5000) %>% filter(age > 18) #takes a sample of size 5000, extracts only the rows for which age > 18, and saves the result in newNCHS See data-wrangling-cheatsheet.pdf

  17. I use R, and the do() command in the mosaic package, for simulations. data(FirstYearGPA) #loads in the data frame FY <- FirstYearGPA) #rename the data frame lm(GPA ~ SATM, data=FY) #gives 0.0012 as slope lm(GPA ~ SATM, data=FY)$coeff[2] #just look at the slope do(3)*lm(GPA ~ shuffle(SATM), data=FY)$coeff[2] #break link b/w GPA and SATM null.dist <- do(1000)*lm(GPA ~ shuffle(SATM), data=FY)$coeff[2] #1000 random slopes histogram(null.dist$SATM, v=0.0012) #look at the 1000 slopes with(null.dist, tally(abs(SATM.)>=0.0012)) #How many are far from zero? with(null.dist, tally(abs(SATM.)>=0.0012, format='prop')) #What proportion are far from zero?

  18. Using Predict.Plot to show Pr(win) as SaveDiff varies, for a fixed set of values for sixother predictors. plot(jitter(Win,amount=.05)~SaveDiff,data=LaXdata) Predict.Plot(modelDiff,pred.var="SaveDiff",DrawDiff=-11, ShotDiff=6, TODiff=-3, ClearPctDiff=0.0952, ShotGoalDiff=1, GroundDiff=5, add=TRUE,plot.args=list(col='blue')) #OCWLaX game vs BW

More Related