320 likes | 410 Views
Analysis of variance. Spiders on Mazurian lake islands : Wigry –Mikołajki, Nidzkie, Bełdany). Photo: Wigierski Park Narodowe. Photo: Ruciane.net. Salticidae. Araneus diadematus. Photo: Eurospiders.com. Spider species richness on Mazurian lake islands.
E N D
Analysis of variance Spiders on Mazurianlakeislands: Wigry –Mikołajki, Nidzkie, Bełdany) Photo: Wigierski Park Narodowe Photo: Ruciane.net Salticidae Araneusdiadematus Photo: Eurospiders.com
Spiderspeciesrichness on Mazurianlakeislands Doesspeciesrichnessdifferwithrespect to thedegree of disturbance? If we usethe same test severaltimeswiththe same data we have to apply a Bonferronicorrection. Single test n independent tests
Spiderspeciesrichness on Mazurianlakeislands Sir Ronald Aylmer Fisher(1890-1962) One wayanalysis of variance sH2 sM2 Iftherewould be no differencebetweenthesitestheaveragewithinvariance sWithin2 shouldequalthevariancebetweenthesites sBetween2 . sL2 sBetween2 sT2 We test for significanceusingtheF-test of Fisher with k-1 (Between) and n-k (Within) degrees of freedom. sP2 n-1 = n-k + k-1 dfBetween dfWithin dfTotal
TheTuckey test comparessimultaneouslythemeans of allcombinations of groups. It’s a t-testcorrected for multiplecomparisons (similar to a Bonferronicorrection) TheLevene test comparesthe group variancesusingthe F distribution. Variancesshouldn’tdiffertoo much (shouldn’t be heteroskedastic)!!! Welch test
We includetheeffect of islandcomplex (Wigry – Nidzkie, Bełdany, Mikołaiki) Theremust be atleasttwo data for eachcombination of groups. We use a simpletwoway ANOVA Main effects Secondary effects
Thesignificancelevelshave to be divided by thenumber of tests (Bonferronicorrection) Spiderspeciesrichnessdoes not significantlydepend on islandcomplex and degree of disturbance.
Correcting for covariates: Anaysis of covariance Instead of usingtheraw data we usetheresiduals. Thesearetheareacorrectedspeciesnumbers. Theconmparison of within group residuals and between group residualsgivesourF-statistic.
Disturbancedoes not significantly influence areacorrectedspeciesrichness Within group residuals We needfourregressionequations: one fromall data points and threewithingroups. Total residuals SStotal= SSbetween + SSerror
BeforeAfter Repetitive designs In medical research we test patients before and after medical treatment to infer the influence of the therapy. We have to divide the total variance (SStotal) in a part that contains the variance between patients (SSbetween) and within the patient (SSwithin). The latter can be divided in a part that comes from the treatment (SStreat) and the error (SSerror) Medical treatment SSbetween SSwithin
Before – afteranalysisinenvironmentalprotection dftreat = k-1 dfError = (n-1)(k-1) In thecase of unequalvariancesbetweengroupsitissaveto use the conservative ANOVA with (n-1) dferror and only one dfEffectinthefinalF-test.
Bivariatecomparisonsinenvironmentalprotection Due to possibledifferencesinislandareasbetweenthetwoislandcomplexes we have to usetheresiduals. A directt-test on raw data would be erroneous. Theoutlierwoulddisturbdirectcomparisons of speciesrichness
Permutationtesting Upper 2.5% confidence limit. Observed P(t) 10000 randomizations of observedvaluesgives a nulldistribution of t-values and associated probabilitylevelswithwhich we comparetheobserved t. Thisgivestheprobabilitylevel for ourt-test.
Bivariatecomparisonsusing ANOVA t and F testscanboth be used for pairwisecomparisons.
Repeatedmeasures Speciesrichness of groundlivingHymenopterain a beechforest Photo Simon van Noort Photo Tim Murray
Advices for using ANOVA: • You need a specific hypothesis about your variables. In particular, designs with more than one predicator level (multifactorial designs) have to be stated clearly. • ANOVA is a hypothesis testing method. Pattern seeking will in many cases lead to erroneous results. • Predicator variables should really measure different things, they should not correlate too highly with each other • The general assumptions of the GLM should be fulfilled. In particular predicators should be additive. The distribution of errors should be normal. • It is often better to use log-transformed values • In monofactorial designs where only one predicator variable is tested it is often preferable to use the non-parametric alternatives to ANOVA, the Kruskal Wallis test. The latter test does not rely on the GLM assumptions but is nearly as powerful as the classical ANOVA. • Another non-parametric alternative for multifactorial designs is to use ranked dependent variables. You loose information but become less dependent on the GLM assumptions. • ANOVA as the simplest multivariate technique is quite robust against violations of its assumptions.
Startinghyotheses • Thedegree of disturbance (humanimpact) influencesspeciesrichenss. • Speciesrichness and abundancedepends on islandarea and environmentalfactors. • Island ensemblesdifferinspeciesrichness and abundance. • Area, abundance, and speciesrichnessarenon-linearlyrelated. • Latitude and longitude do not influence speciesrichness. Sorting • Area, abundance, and speciesrichnessarenon-linearlyrelated. • Latitude and longitude do not influence speciesrichness. • Speciesrichness and abundancedepends on islandarea and environmentalfactors. • Island ensemblesdifferinspeciesrichness and abundance. • Thedegree of disturbance (humanimpact) influencesspeciesrichenss. Thehypothesesare not independent. Eachhypothesisinfluencesthewayhow to treatthenext.
Area, abundance, and speciesrichnessarenon-linearlyrelated. Species – area and individualsarearelationships
Latitude and longitude do not influence speciesrichness. Doesthedistancebetweenislands influence speciesrichness? Aregeographically near islandsalsosimilarinspeciesrichnessirrespective of islandarea? Isspeciesrichnesscorrelatedwithlongitude and latitude? R(S-Long) = 0.22 n.s. R(S-Lat) = 0.28 n.s.) Spatialautocorrelation Thatthereis no significantcorrelationdoes not meanthatlatitude and longitude do not have an influence on theregression model withenvironmentalvariables. S2 S1 S3 S4 S5 S6 In spatialautocorrelationthedistancebetweenstudysites influence theresponse (dependent) variable. Spatialyadjacentsitesarethenexpected to be moresimilarwithrespect to theresponsevariable.
Moran’s I as a measure of spatialautocorrelation Moran’s I is similar to a correlation coefficient all applied to pairwise cells of a spatial matrix. It differs by weighting the covariance to account for spatial non-independence of cells with respect to distance. S2 S1 S3 S4 S5 S6 If cell values were randomly distributed (not spatially autocorrelated) the expected I is Statisticalsignificanceiscalculatedfrom a Monte Carlo simulation All combinations of sites
Individuals/trap isslightlyspatiallyautocorrelated Latitude and longitudeslightly influence speciesrichenss. Eventhisweakeffectmight influence theoutcome of a regressionanalysis.
Too manyvariables!! High multicollinearity Solution: priorfactoranalysis to reducethenumber of dependent variables
Stepwisevariableelimination Highlycorrelatedvariablesessentiallycontainthe same information. Correlations of less than 0.7 can be tolerated. Hencecheck first thematrix of correlationcoefficients. Eliminatevariablesthat do not addinformation. Standardizedcoefficients (b-values) areequivalents of correlationcoefficients. Theyshouldhavevaluesabove 1. Suchvalues point to too high correlationbetweenthepredictorvariables (collinearity). Collnearitydisturbsanyregression model and has to be eliminatedprior to analysis.
The final model afterstepwisevariableelimination Simple test wiseprobabilitylevels. We yethave to correct for multipletesting. Thebest model is not alwaysthe one withthelowest AIC orthehighest R2. Bonferronicorrection To get an experimentwiseerrorrate of 0.05 our test wiseerrorrateshave be less than 0.05/n Speciesrichnessispositivelycorrelatedwithislandarea and negativelywithsoilhumidity.
Island ensemblesdifferinspeciesrichness and abundance. A simple ANOVA does not detectanydifference Speciesrichnessdepends on environmentalfactorsthatmaydifferbetweenislandensembles. Analysis of covariance (ANCOVA)
Analysis of covariance (ANCOVA) ANCOVA isthecombination of multipleregression and analysis of variance. First we perform a regressionanlyis and usetheresiduals of thefull model as entriesinthe ANOVA. ANCOVA isthe ANOVA on regressionresiduals. Themetricallyscaledvariablesserve as covariates. Siteswithvery high positiveresidualsareparticularlyspeciesrichevenafter controlling for environmentalfactors. Theseareecological hot spots. Regressionanalysisserves to identifysuch hot spots We usetheregressionresiduals for furtheranalysis
ANCOVA Speciesrichnessdoes not differbetweenislandensembles.
Thedegree of disturbance (humanimpact) influencesspeciesrichenss. Speciesrichness of spiders on lakeislandsappears to be independent of thedegree of disturbance
Howdoesabundancedepend on environmentalfatcors? The full model and stepwisevariableelimination Standardizedcoefficientsareabove 1. Thispoints to too high collinearity Most coefficientsarehighlysignificant! We furthreliminateuninformativevariables. Abundancedoes not significallydepend on environmentalvariables
Howdoesabundancedepend on thedegree of disturbance? Abundance of spiders on lakeislandsappears to be independent of thedegree of disturbance