180 likes | 247 Views
Lecture 13 Analysis of variance. Spiders on Mazurian lake islands : Wigry –Mikołajki, Nidzkie, Bełdany. Photo: Wigierski Park Narodowe. Photo: Ruciane.net. Salticidae. Araneus diadematus. Photo: Eurospiders.com. Spider species richness on Mazurian lake islands.
E N D
Lecture13 Analysis of variance Spiders on Mazurianlakeislands: Wigry –Mikołajki, Nidzkie, Bełdany Photo: Wigierski Park Narodowe Photo: Ruciane.net Salticidae Araneusdiadematus Photo: Eurospiders.com
Spiderspeciesrichness on Mazurianlakeislands Doesspeciesrichnessdifferwithrespect to thedegree of disturbance? If we usethe same test severaltimeswiththe same data we have to apply a Bonferronicorrection. Single test n independent tests
Spiderspeciesrichness on Mazurianlakeislands Sir Ronald Aylmer Fisher(1890-1962) One wayanalysis of variance sH2 sM2 Iftherewould be no differencebetweenthesitestheaveragewithinvariance sWithin2 shouldequalthevariancebetweenthesites sBetween2 . sL2 sBetween2 sT2 We test for significanceusingtheF-test of Fisher with k-1 (Between) and n-k (Within) degrees of freedom. sP2 n-1 = n-k + k-1 dfBetween dfWithin dfTotal
TheTuckey test comparessimultaneouslythemeans of allcombinations of groups. It’s a t-testcorrected for multiplecomparisons (similar to a Bonferronicorrection) TheLevene test comparesthe group variancesusingthe F distribution. Variancesshouldn’tdiffertoo much (shouldn’t be heteroskedastic)!!! Welch test
We includetheeffect of islandcomplex (Wigry – Nidzkie, Bełdany, Mikołaiki) Theremust be atleasttwo data for eachcombination of groups. We use a simpletwoway ANOVA Main effects Secondary effects
Thesignificancelevelshave to be divided by thenumber of tests (Bonferronicorrection) Spiderspeciesrichnessdoes not significantlydepend on islandcomplex and degree of disturbance.
Correcting for covariates: Anaysis of covariance Instead of usingtheraw data we usetheresiduals. Thesearetheareacorrectedspeciesnumbers. Theconmparison of within group residuals and between group residualsgivesourF-statistic.
Disturbancedoes not significantly influence areacorrectedspeciesrichness Within group residuals We needfourregressionequations: one fromall data points and threewithingroups. Total residuals SStotal= SSbetween + SSerror
BeforeAfter Repetitive designs In medical research we test patients before and after medical treatment to infer the influence of the therapy. We have to divide the total variance (SStotal) in a part that contains the variance between patients (SSbetween) and within the patient (SSwithin). The latter can be divided in a part that comes from the treatment (SStreat) and the error (SSerror) Medical treatment SSbetween SSwithin
Before – afteranalysisinenvironmentalprotection dftreat = k-1 dfError = (n-1)(k-1) In thecase of unequalvariancesbetweengroupsitissaveto use the conservative ANOVA with (n-1) dferror and only one dfEffectinthefinalF-test.
Bivariatecomparisonsinenvironmentalprotection Due to possibledifferencesinislandareasbetweenthetwoislandcomplexes we have to usetheresiduals. A directt-test on raw data would be erroneous. Theoutlierwoulddisturbdirectcomparisons of speciesrichness
Permutationtesting Upper 2.5% confidence limit. Observed P(t) 10000 randomizations of observedvaluesgives a nulldistribution of t-values and associated probabilitylevelswithwhich we comparetheobserved t. Thisgivestheprobabilitylevel for ourt-test.
Bivariatecomparisonsusing ANOVA t and F testscanboth be used for pairwisecomparisons.
Repeatedmeasures Speciesrichness of groundlivingHymenopterain a beechforest Photo Simon van Noort Photo Tim Murray
Advices for using ANOVA: • You need a specific hypothesis about your variables. In particular, designs with more than one predicator level (multifactorial designs) have to be stated clearly. • ANOVA is a hypothesis testing method. Pattern seeking will in many cases lead to erroneous results. • Predicator variables should really measure different things, they should not correlate too highly with each other • The general assumptions of the GLM should be fulfilled. In particular predicators should be additive. The distribution of errors should be normal. • It is often better to use log-transformed values • In monofactorial designs where only one predicator variable is tested it is often preferable to use the non-parametric alternatives to ANOVA, the Kruskal Wallis test. The latter test does not rely on the GLM assumptions but is nearly as powerful as the classical ANOVA. • Another non-parametric alternative for multifactorial designs is to use ranked dependent variables. You loose information but become less dependent on the GLM assumptions. • ANOVA as the simplest multivariate technique is quite robust against violations of its assumptions.
Home work and literature • Refresh: • ANOVA • Treatments • Degrees of freedom • Repeated design • Incomplete design • Permutationtesting • Welsh test • Tuckey test • Prepare to thenextlecture: • Binomialdistribution • Combinations Literature: Łomnicki: Statystyka dla biologów http://statsoft.com/textbook/