230 likes | 389 Views
STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS. Assist.Prof.Dr . R. Serkan Albayrak Department of Business Administration Yaşar University. INTRODUCTION TO ANOVA. The easiest wau to understand ANOVA is to generate a tny data set using GLM:
E N D
STAT 401 EXPERIMENTAL DESIGN AND ANALYSIS Assist.Prof.Dr. R. Serkan Albayrak Department of Business Administration Yaşar University
INTRODUCTION TO ANOVA • The easiestwautounderstand ANOVA is togenerate a tny data set using GLM: As a first step set themean, to 5 forthedatasetwith 10 cases. Inthetablebelowall 10 caseshave a score of 5 at thispoint.
The next step is toaddtheeffects of the IV. Supposethattheeffect of thetreatment at is toraisescoresby 2 unitsandtheeffect of thetreatment at is tolowerscoresby 2 units.
The changesproducedbytreatmentarethedeviations of thescoresfromOverall of thesecasesthedeviations is This is thesum of the (squared) effects of treatmentifallcasesareinfluencedidenticallybythevariouslevels of A andthere is noerror.
Then thevarianceforthegroup is Andthevarianceforthegroup is Theaverage of thesevariances is also 1.5 Checkthatthesenumbersrepresenterrorvariance; thatmeanstheyrepresentrandomvariability in scoreswithineachgroupwhereallcasesaretreatedthesameandthereforeareuncontaminatedbyeffects of the IV. Thevarianceforthisgroup of 10 numbers, ignoringgroupmemebership is
StandardSetupfor ANOVA The differencebetweeneachscoreandthe Grand Mean is brokenintotwocomponents: Thedifferencebetweenthescoreanditsowngroupmean Thedifferencebetweenthatgroupmeanandthegrandmean
Sum of squaresfortreatment Theeffect of the IV!!! Sum of squaresforerror Eachterm is thensquaredandsummedseperatelytoproducethesum of squaresforerrorandthesum of squaresfortreatmentseperately. Thebasicpartitionholdsbecausethecrossproducttermsvanish.
This isthedeviation form of basic ANOVA. Each of theseterms is a sum of squares(SS). Theaverage of thissum is the total variance in the set of scoresignoringgroupmemebership. Thisterm is calledsum of squarewithingroups. Thisterm is called SS betweengroups. This is sum is frequentlysymbolized as,
At this point it is important to realize that the total variance in the set of scores is partitioned into two sources. One is the effect of the IV and the other is all remaining effects (which we call error). Because the effects of the IV are assessed by changes in the central tendencies of the groups, the inferences that come from ANOVA are about differences in central tendency. Before going further it may be a nice exercise to think about alternatives of central tendency. Can we alter the formula of variance by replacing mean with something else? Why mean is preferred? Howeversum of squaresare not yet variances. Tobecomevariances, theymust be ‘averaged’. Thedenominatorsforaveraging SS must be degrees of freedomsothatthestatisticswillhave a properdistribution(rememberpreviousslides).
So far we now that the degrees of freedom of must be N-1. Furthermore, Also, Thuswehave (as expected)
Varianceis an ‘averaged’ sum of squares (forempirical data of course). Thentoobtainmeansum of squares (MS), The F distribution is a samplingdistribution of theratio of twodistributions. Thisstatististic is usedto test thenullhypothesisthat
Distribution of SSE, SST SSE= SST=
What is theexpectedvalue of F under NULL? under NULL Suppose is true. Then, Sounderarei.i.d. Then, is an unbiasedestimator of Observethat,
under NULL Remember that Then under NULL,
Therefore under, F must be around 1. It is sensitivetodeviationsfromthenulland can measureevidenceagainst. Onemorenotebeforeweproceed. Whydid not wetryout t-test initially? Whatwaswrongwith it in thissetup?