1 / 23

R for Macroecology

R for Macroecology. Tests and models. Smileys. Homework. Solutions to the color assignment problem?. Statistical tests in R. This is why we are using R (and not C++)! t.test () aov () lm() glm () And many more. Read the documentation!.

bud
Download Presentation

R for Macroecology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R for Macroecology Tests and models

  2. Smileys

  3. Homework • Solutions to the color assignment problem?

  4. Statistical tests in R • This is why we are using R (and not C++)! • t.test() • aov() • lm() • glm() • And many more

  5. Read the documentation! • With statistical tests, its particularly important to read and understand the documentation of each function you use • They may do some complicated things with options, and you want to make sure they do what you want • Default behaviors can change (with, e.g. sample size)

  6. Returns from statistical tests • Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = -1.4771, df = 18, p-value = 0.1569 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.8446619 0.8446619 sample estimates: mean of x mean of y 5.5 7.5

  7. Returns from statistical tests • Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 $ statistic : Named num -1.48 ..- attr(*, "names")= chr "t" $ parameter : Named num 18 ..- attr(*, "names")= chr "df" $ p.value : num 0.157 $ conf.int : atomic [1:2] -4.845 0.845 ..- attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] 5.5 7.5 ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0 ..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

  8. Returns from statistical tests • Getting the results out • This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test$p.value [1] 0.1569322 > test$conf.int[2] [1] 0.8446619 > test[[3]] [1] 0.1569322

  9. Model specification • Models in R use a common syntax: • Y ~ X1 + X2 + X3 . . . + Xi • Means Y is a linear function of X1:j

  10. Linear models • Basic linear models are fit with lm() • Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

  11. Linear models • summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -1.293e-15 -1.986e-16 1.343e-16 3.689e-16 6.149e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e+00 4.019e-16 4.976e+15 <2e-16 *** x 1.000e+00 6.477e-17 1.544e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

  12. Extracting coefficients, P values • For the list (e.g. “test”) returned by lm(),test$coefficientswill give the coefficients, but not the std.error or p value. • Instead, use summary(test)$coefficients

  13. Model specification - interactions • Interactions are specified with a * or a : • X1 * X2 means X1 + X2 + X1:X2 • (X1 + X2 + X3)^2 means each term and all second-order interactions • - removes terms • constants are included by default, but can be removed with “-1” • more help available using ?formula

  14. Quadratic terms • Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test$coefficients (Intercept) x I(x^2) 2.000000e+00 1.000000e+00 -4.401401e-17

  15. A break to try things out • t test • anova • linear models

  16. Plotting a test object • plot(t.test(x,y)) does nothing • plot(lm(y~x)) plots diagnostic graphs

  17. Forward and backward selection • Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

  18. > x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max -0.517640 -0.264732 -0.003983 0.241431 0.525636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.47422 0.09381 5.055 2.06e-06 *** x2 0.90890 0.09887 9.192 8.08e-15 *** x3 1.11036 0.10555 10.520 < 2e-16 *** x1 1.00836 0.10865 9.281 5.23e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3059 on 96 degrees of freedom Multiple R-squared: 0.7701, Adjusted R-squared: 0.7629 F-statistic: 107.2 on 3 and 96 DF, p-value: < 2.2e-16

  19. All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows Cp, adjusted R2, or R2) nbest – number of models of each size to return

  20. > Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) $which 1 2 3 4 5 6 1 FALSE TRUE FALSE FALSEFALSEFALSE 1 TRUE FALSE FALSEFALSEFALSEFALSE 2 TRUE FALSE TRUE FALSE FALSEFALSE 2 FALSE TRUE TRUE FALSE FALSEFALSE 3 TRUE TRUETRUE FALSE FALSEFALSE 3 FALSE TRUE TRUETRUE FALSE FALSE 4 TRUE TRUETRUETRUE FALSE FALSE 4 TRUE TRUETRUE FALSE TRUE FALSE 5 TRUE TRUETRUETRUETRUE FALSE 5 TRUE TRUETRUETRUE FALSE TRUE 6 TRUE TRUETRUETRUETRUETRUE $label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" $size [1] 2 2 3 3 4 4 5 5 6 6 7 $Cp [1] 191.732341 197.672391 85.498456 87.117764 3.466464 81.286117 3.579295 5.078805 [9] 5.138063 5.509598 7.000000 > leapOut= leaps(x = Xmat, y, method = "Cp", nbest = 2)

  21. > leapOut= leaps(x = Xmat, y, method = "Cp", nbest = 2) > Xmat= cbind(x1,x2,x3,x4,x5,x6) > leapOut= leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals= NULL > for(iin 1:nrow(leapOut$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut$which[i,]],collapse = "+"))) >test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] 159.13825 161.18166 113.95184 114.84992 52.81268 112.42959 52.81609 [8] 54.40579 54.34347 54.74159 56.19513 > i = 8 > leapOut$which[i,] 1 2 3 4 5 6 TRUE TRUETRUEFALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

  22. Comparing AIC of the best models > data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSEFALSEFALSE 159.13825 2 TRUE FALSE FALSEFALSEFALSEFALSE 161.18166 3 TRUE FALSE TRUE FALSE FALSEFALSE 113.95184 4 FALSE TRUE TRUE FALSE FALSEFALSE 114.84992 5 TRUE TRUETRUE FALSE FALSEFALSE 52.81268 6 FALSE TRUE TRUETRUE FALSE FALSE 112.42959 7 TRUE TRUETRUETRUE FALSE FALSE 52.81609 8 TRUE TRUETRUE FALSE TRUE FALSE 54.40579 9 TRUE TRUETRUETRUETRUE FALSE 54.34347 10 TRUE TRUETRUETRUE FALSE TRUE 54.74159 11 TRUE TRUETRUETRUETRUETRUE 56.19513

  23. Practice with the mammal data • VIF, lm(), AIC(), leaps()

More Related