1 / 25

Why R?

Why R?. Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics. Why not?. Time to learn code Very simple statistics may be faster with “point-and-click” software (e.g. Statistica, JMP).

xiu
Download Presentation

Why R?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why R? • Free • Powerful (add-on packages) • Online help from statistical community • Code-based (can build programs) • Publication-quality graphics

  2. Why not? • Time to learn code • Very simple statistics may be faster with “point-and-click” software • (e.g. Statistica, JMP)

  3. Why generalized linear models (GLMs)? • Most ecological data FAIL these two assumptions of parametric statistics: • Variance is independent of mean (“homoscedasticity”) • Data are normally distributed

  4. Taylors power law:most ecological data has 1>b>2 Variance = a* Mean b Variance Mean

  5. Many types of ecological data are expected to be non-normal • Count data are expected to be Poisson • Examples: population size, species richness • Binary (0,1) data are expected to be binomial • Examples: survivorship, species presence

  6. Workshop in R & GLMs Session 1: Basic commands + linear models Session 2: Testing parametric assumptions Session 3: How generalized linear models work Session 4: Model simplification and overdispersion

  7. Exercise Ready! • Open R • “>” is the command prompt • 2. Write: • x <- “hello” • x • 3. What do the arrow keys do? And the “end” key?

  8. Exercise x <- 5 y<- 1 x+y; x*y; x/y ; x^y sqrt(x); log (x); exp (x) “;” means new command follows • Careful! • Capitalization matters, Y and y are different. • Spaces do not matter, x<-5 is the same as x < - 5.

  9. Vectors 8 2 5 9 X <- c(8,2,5,9) “c” means combine

  10. 0,0,0,0 1,2,3,4 1,3,5,7 Vectors x <- rep (0,4) x <- 1:4 x <- seq (1,7, by=2) Create a vector called “test” 0,0,0,0,2,4,6,8,10 using all of the commands c, rep, seq test<- c (rep(0,4), seq(2,10,by=2))

  11. 3 1,5 3,5,7 9,3,5,7 Vectors Select an element of your vector (x = 1,3,5,7): x[2] x[c(1,3)] x[2:4] Change an element of your vector (x = 1,3,5,7): x[1] <- 9 ; x

  12. Matrices • Dog Cat • 1 2 • 4 3 • 6 5 • 8 7 Dog <- c(1,4,6,8) Cat<- c(2,3,5,7) Animals<-cbind (Dog, Cat) vector vector matrix

  13. false true false true 2 3 Logical operators x<- 5; y<- 6 x > y x< y x==y x!=y True is the same as 1, false is the same as 0 2 + (x>=y) 2 + (x<=y)

  14. 3,4 Logical operators x<- c(1,2,3,4); y<- c(5,6,7,8) z <- x [y >= 7]; z Useful for quickly making subsets of your data! x<- c(1,0.01,3,0.02) In this vector, change all values <1 to 0 x[x<1]<-0

  15. 2 Conditional operators x<- 5 ; z<-0 if (x>4) {z<-2}; z Could have a large program running in { }

  16. 207.3996 Loops y<-0; x<-0 for (y in 1:20) {x<- x+ 0.5; print(x)} Useful for programming randomization procedures. Bootstrap example: y<-0; x<-1:50 output<-rep(0,1000) for (y in 1:1000) {output [y] <- var (sample (x, replace=T))} mean(output)

  17. Writing programs I encourage you to use the script editor! File > New script Write your code Select the code you want to run (CTRL-A is all code) Run code (CTRL-R) File > Save as R script files are always *.R

  18. Entering data 1. In Excel, give your data columns/rows and text data simple one word labels (e.g."treatment") 2. Format cells so < 8 digits per cell. 3. Save as "csv" file. 4. Use the following command to find and load your file: diane<-read.table(file.choose(),sep=“,”,header=TRUE) 5. Check it is there! diane Invent a dataframe name

  19. Dataframes • Dataframes are analogous to spreadsheets • Best if all columns in your dataframe have the same length • Missing values are coded as "NA" in R • If you coded your missing values with a different label in your spreadsheet (e.g. "none") then: • read.table (….., na.strings="none")

  20. Dataframes Two ways to identify a column (called "treatment") in your dataframe (called "diane"): diane$treatment OR attach(diane); treatment At end of session, remember to: detach(diane)

  21. Summary statistics length (x) mean (x) var (x) cor (x,y) sum (x) summary (x) minimum, maximum, mean, median, quartiles What is the correlation between two variables in your dataset?

  22. Factors • A factor has several discrete levels (e.g. control, herbicide) • If a vector contains text, R automatically assumes it is a factor. • To manually convert numeric vector to a factor: • x <- as.factor(x) • To check if your vector is a factor, and what the levels are: • is.factor(x) ; levels(x)

  23. Homework 1. Download R on your computer. Either go to http://www.r-project.org/ and follow the download CRAN links or directly to http://mirror.cricyt.edu.ar/r/ 2. Instruction Manuals to R are found at main webpage: http://www.r-project.org/ follow links to Documentation > Manuals I recommend "An Introduction to R"

  24. 3. Write a short program that: • Allows you to import the data from Lakedata_06.csv • (posted on www.zoology.ubc.ca/~srivast/zool502) • Make lake area into a factor called AreaFactor: • Area 0 to 5 ha: small • Area 5.1 to 10: medium • Area > 10 ha: large

  25. hints You will need to: 1. Tell R how long AreaFactor will be. 2. Assign cells in AreaFactor to each of the 3 levels 3. Make AreaFactor into a factor, then check that it is a factor

More Related