260 likes | 282 Views
Learn how to calculate statistics on a vector in R using high temperatures data for Philadelphia in August. Explore functions like mean, median, standard deviation, min, max, quartiles, summary, sort, length, and create visualizations like histograms and boxplots.
E N D
Suppose we had a vector corresponding to high temperatures in Philadelphia in August • high_temps<-c(90, 87, 89, 90, 88, 86, 91, 89, 88, 85, 83, 88, 80, 83, 87, 89, 89, 91, 93, 92, 92, 92, 76, 79, 78, 75, 79, 85, 83, 88, 86) • We will consider another time how to get data from Excel to R or vice versa, but for now just copy the above code into R • For now we just want to focus on some of the simple statistic functions built into R that can act on a vector
Open RStudio. Seems like there’s some old stuff in the Environment
Click on the List icon (upper right) and switch to Grid. Grid will allow one to pick and choose what to clear out – though actually we will clear everything out this time
Check the “objects” and use the broom icon to clear them out.
Copy and paste the data into RStudio and run it • high_temps<-c(90, 87, 89, 90, 88, 86, 91, 89, 88, 85, 83, 88, 80, 83, 87, 89, 89, 91, 93, 92, 92, 92, 76, 79, 78, 75, 79, 85, 83, 88, 86) • The keyboard shortcut for copying is Ctrl-c (copies whatever is highlighted) • The keyboard shortcut for pasting is Ctrl-v • Then run: place the cursor on the line (or highlight it) and click on the Run icon (or type Ctrl-Enter)
Result of paste and run Note that the tab turns red which means that the current version is different from the saved version. Click on the Save (floppy disk) icon if you want to save the data you just pasted in
The quantile function (R-speak) can be used to determine the first quartile (Excel-speak)
The quantile function can be used to determine the third quartile
The summary function is a quick way to get a number of standard statistical measures on a set of data
Use boxplot to obtain a “box and whiskers” display of the data
Here’s what happened when I changed the first temperature to 100 and re-ran all the code. Now you see outliers.