990 likes | 1.12k Views
Review of last week. Variables. Response variable – y Explanatory variable – x [ today one of each ] Continuous variables Categorical variables (binary…). 16. 14. 12. 10. 8. 6. 4. Red ants. Black ants. Logistic regression. 2 2 tables. Categoric. 1.0. Melica. 0.8. 0.6.
E N D
Variables • Response variable – y • Explanatory variable – x • [ today one of each ] • Continuous variables • Categorical variables (binary…)
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants One continuous response variable& one or more explanatory variables Generel linear model Regression Anova + Continuous Response variable - - Seed size Continuous Categoric Explanatory variable
Generel linear models with: Many continuousexplanatoriesare usually called multiple regression Many categorical explanatoriesare usually called multiway ANOVA One continuousexplanatory and one (or sometimes many) categorical explanatories are usually called ANCOVA.
Test tools t-test F-test (anova) Chi-2
Test tools t-test F-test (anova) Chi-2
Generel linear models with: Many continuousexplanatoriesare usually called multiple regression Many categorical explanatoriesare usually called multiway ANOVA One continuousexplanatory and one (or sometimes many) categorical explanatories are usually called ANCOVA.
Anova table on pollination Anova(lm(Seed.number~colour*poll.treat)) Sum Sq Df F value Pr(>F) colour 3122.3 1 19.8440 2.846e-05 *** poll.treat 1693.1 1 10.7604 0.001567 ** col:pol 829.8 1 5.2737 0.024406 * Residuals 11958.0 76
Generel linear models with: Many continuousexplanatoriesare usually called multiple regression Many categorical explanatoriesare usually called multiway ANOVA One continuousexplanatory and one (or sometimes many) categorical explanatories are usually called ANCOVA.
Assumptions for parametric tests with continuous response i.e., also linear models!! About the same variation in all groups or along a continuous variable or along fitted values Pretty normal residuals (= noice)
About the same variation? Forest Meadow
Pretty normal residuals Histogram of residuals Histogram of response variable seed size 14 20 Meadow 12 15 10 Forest No. species No. species 8 10 6 4 5 2 0 0 -1 -0,5 0 +0,5 0 1 2 3 Seed size in mm Distanse in mm from respective group mean
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 3 4 5 6 7 [2,] 3 4 5 6 7 8 [3,] 4 5 6 7 8 9 [4,] 5 6 7 8 9 10 [5,] 6 7 8 9 10 11 [6,] 7 8 9 10 11 12
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 2 3 4 5 6 [2,] 2 4 6 8 10 12 [3,] 3 6 9 12 15 18 [4,] 4 8 12 16 20 24 [5,] 5 10 15 20 25 30 [6,] 6 12 18 24 30 36
200 500 Plus effect or percent-effect 400 Plus effect From 0-5: 100 + 100 = 200 From 5-10:200 + 100 = 300 300 No. of Aphids 300 y 200 100 100 0 0 1 2 3 4 5 6 7 8 9 10 Weeks x
200 500 500 Plus effect or percent effect 400 Plus effect From 0-5: 100 + 100 = 200 From 5-10:200 + 100 = 300 300 No. of aphids 300 y 200 100 Percent effect From 0-5: 80 2,5 = 200 From 5-10:200 2,5 = 500 100 80 0 0 1 2 3 4 5 6 7 8 9 10 Weeks x
Non transformed Log transformed 500 500 200 400 No. of aphids No. of aphids 50 300 20 200 10 5 100 2 1 0 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 Weeks Weeks
500 400 300 200 100 0 0 1 2 3 4 5 6 7 8 9 10 Plus effect or percent per percent 400 Seed weight in μg Plus effect From 0-5: 100 + 100 = 200 From 5-10:200 + 100 = 300 300 y 100 100 Percent per percent From 2,5 till 5 = 200%: 100 200% = 200 From 5 till 10 = 200%: 200 200% = 400 Leaf length in cm x
Non transformed Log Log transformed 500 Seed weight in μg Seed weight in μg 200 400 50 300 200 10 5 100 2 1 0 0 1 2 3 4 5 6 7 8 9 1 2 5 10 Leaf length in cm Leaf length in cm
5 possible models • Lichen size only depends on the total mean. • Lichen size depends on what site the lichen grows (city vs university). • Lichen size depends on the tree size (≈ age?). • Lichen size depends both on site AND tree size. • Lichen size depends on tree size, but the relationship between tree size and lichen size differs between the sites (city / univ).
Check your data import > names(d) [1] "tree.circum" "lich.diam" "tree.spec" "site” > is.numeric(lich.diam) [1] TRUE > is.numeric(tree.circum) [1] TRUE > levels(site) [1] "city" "uni"
Check your data import > names(d) [1] "tree.circum" "lich.diam" "tree.spec" "site" > is.numeric(lich.diam) [1] TRUE > is.numeric(tree.circum) [1] FALSE > levels(site) [1] "city" "ciyt" "uni"
Should you log your lichen sizes? • Does it look so bad that your test may be incorrect? • Does a log transformation improve the model assumptions? • Constant variation most important. • Does it make biological sence that the explanatory variables affect the percent increase in lichen size rather than the increase in mm?
Should you log your lichen sizes? • Does it look so bad that your test may be incorrect? – Naa, probably not. • Does a log transformation improve the model assumptions? – YES! • Does it make biological sence with a percent increase? – Well I guess so. • OK, let’s use the logged values!
A B C D E F G Mainland Most: Fewest
5 possible models • Log lichen size only depends on the total mean. • Log lichen size depends on what site the lichen grows (city vs university). • Log lichen size depends on the tree size (≈ age?). • Log lichen size depends both on site AND tree size. • Log lichen size depends on tree size, but the relationship between tree size and log lichen size differs between the sites (city / univ).
Models log.lich.diam<-log10(lich.diam) log.mod.int<-lm(log.lich.diam~tree.circum+site +tree.circum:site) log.mod.both<-lm(log.lich.diam~tree.circum+site) log.mod.tree.circum<-lm(log.lich.diam~tree.circum) log.mod.site<-lm(log.lich.diam~site) log.mod.null<-lm(log.lich.diam~1)
Anova table on logged lichens Anova(lm(log.lich.diam~tree.circum+site+ tree.circum:site)) = Anova(log.mod.int) Response: log.lich.diam Sum Sq Df F value Pr(>F) tree.circum 0.5808 1 9.7584 0.002826 ** site 0.5431 1 9.1238 0.003797 ** tree.circum:site 0.0047 1 0.0784 0.780444 Residuals 3.3332 56
Test interaction! anova(log.mod.int,log.mod.both) Model 1: log.lich.diam ~ tree.circum + site + tree.circum:site Model 2: log.lich.diam ~ tree.circum + site Res.Df RSS Df Sum of Sq F Pr(>F) 1 56 3.3332 2 57 3.3378 -1 -0.0047 0.0784 0.7804
Test site! anova(log.mod.both,log.mod.tree.circum) Model 1: log.lich.diam ~ tree.circum + site Model 2: log.lich.diam ~ tree.circum Res.Df RSS Df Sum of Sq F Pr(>F) 1 57 3.3378 2 58 3.8809 -1 -0.5431 9.2737 0.003516 **
Test tree circumference! anova(log.mod.both,log.mod.site) Model 1: log.lich.diam ~ tree.circum + site Model 2: log.lich.diam ~ site Res.Df RSS Df Sum of Sq F Pr(>F) 1 57 3.3378 2 58 3.9187 -1 -0.5808 9.9188 0.002605 **
Conclusion: • Log lichen size depends both on site AND tree size. • Lichens are larger at the University than in the city (p = 0.0035 given the effect of tree size). • Lichen size decreases with increasing tree size (p = 0.0026 given the effect of site)