510 likes | 609 Views
Cold shitty weather. Sorry! Get warm with Glögg!. Power recap. Power recap. It is good to fake data BUT p-values of 1 fake data is crap!. Power recap. A 1000 simulation Power analyses is not crap! BUT Power depends on: Sample size Effect size Variation.
E N D
Cold shitty weather. Sorry! Get warm with Glögg!
Power recap • It is good to fake data • BUT • p-values of 1 fake data is crap!
Power recap • A 1000 simulation Power analyses is not crap! • BUT • Power depends on: • Sample size • Effect size • Variation
Project considerations I • Make graphs • Check for outliers • Check assumptions • Decide if you want to transform y and or x • Check VIF • Are your assumtions still f**k*d up. • Well, that’s for today.
Project considerations II • Interpret interactions first! • If they are significant: Are main effects still interpretable? • Distinguish between: y ~ x1 and y ~ x1 given x2 • – Simplify your models!
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
Assumptions for parametric tests with continuous response i.e., also linear models!! About the same variation in all groups or along a continuous variable or along fitted values Pretty normal residuals (= noice)
The residuals… … are the noice that is not explained by the explanatory variable(s) In a regression the residuals are the distance from the data points to the regression line In an Anova the residual are the distance to the group mean In a linear model the residuals are the distance from the data points to the fitted values.
Solutions • Poisson for counts (generalized linear model) • Non-parametric tests • Resampling methods • Permutation • Bootstrap • Binarize your response
Poisson distribution • Response = Numbers (not true continuous) • Examples • Are there more maple seedlings close to a maple? • Response = number per square • m1<-glm(number~distance,family=Poisson)
Poisson distribution • Usually log(y) also works fine. • Poisson excells: • small means • many zeroes • Many zeroes Hurdle models
Non-parametric tests • Based on ranked values instead of actual data.
Non-parametric tests • Still often in use. • Questionable with modern computers. • In principle permutions of ranked values • But worse than ”real” permutations, because information about actual data values is discarded.
Non-parametric tests • Still often in use. • Questionable with modern computers. • In principle: permutions of ranked values • But worse (than ”real” permutations) because information about actual data values is discarded. BENEFIT: Calm dow outliers!
16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova 2 groups: also t-test Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants Response variable Kendall rank correlation also: Spearman rank Kruskal-Wallis also: Mann-Whitney U-test Paired: Sign test(=binomial) Continuous - - Seed size Continuous Categoric Explanatory variable
Permutations • Does not require normal distribution • BUT, does require distributions to be equal if your hypothesis is not true. • Example: • If the lichens are equally large in the city as they are at campus, they must have the same variation and e.g., skewness. >(cf. non-par!) • In principle a test of if the distributions differ.
bootstrap • to pull oneself up by one's bootstraps • to succeed only on one's own effort or abilities.
Rumex crispus Rumex longifolius 300 250 250 200 200 150 150 100 100 50 50 0 0 1.0 1.1 1.2 1.3 1.4 1.5 1.25 1.30 1.35 1.40
Confidence intervals • …shows how sure we are of a group mean. • The confidence interval will contain the ”true” mean in 95 % of the time. • The larger our sample size the more sure (= confident!) we are of our sample mean the confidence interval decreases • And (of course…), the more variation within groups, the less sure we get confidence interval increases
Bootstrap for tests 120 80 No. boot-samples 60 40 20 0 -5 0 5 10 15 20 25 boot.difference
Bootstrap • Does not require normal distribution of residuals. • Does not require the same variation. • Only requirement is that what you bootstrap (e.g., means) are the same if your hypothesis is not correct. • And, in practice, a large, representative sample
moss.shoot ~ forest type 2000 1500 1000 500 0 0 5 10 15 Bootstrapped difference in moss shoot length
Bootstrap • We use the functionsample(row.names(d),replace=T) • More advanced (and better):library(boot)?boot?boot.ci
Binarize your response • If all other efforts sucks: • Binarize your response • Nothing vs Something • Above the median vs Below the median • bin.y<-ifelse(y < median(y),0,1) • bin.y<-factor(bin.y) • Then do a logistic regression, 2×2, or a generalized linear model
Mail me and your "opponent"! • Your handout/abstract • Before 14.30. • One page. Uno. Odjin. • Mail me your Powerpoint before 17.00 or bring it on USB memory stick. • Compress images to reduce size!
Max 15 min presentation Before you make your powerpoint: Watch this film: http://www.davidairey.com/how-not-to-use-powerpoint/
Mail me your data! • excel file • Help option booking list
Computer exercise • Use yor own data (if cont resp!). • Or old data. • Use either a continuous or categorical explanatory. • Possible also for many explanatories? • Non-parametric Well, usually not • Permutation Yes, but hard • Bootstrap Yes, easy • Binarizing Yes, easy
Exam • Read Learning goals • Read book in relation to learning goals • E.g., no GAM, Survival, Bayesian • Check lecture powerpoints in relation to learning goals • Practice on understanding the excercises (they ARE in the learning goals)
Lunch? or