230 likes | 436 Views
One Way ANOVA using SAS. STT 501 Spring 2007. Example. Let’s test to see if mercury level relates to surface area as defined by the following 4 classes: low-< 75 = 'Less than 75 acres' 75 -< 250 = '75-250 acres' 250 -< 625 = '250-625 acres' 625 -high= 'More than 625 acres'. Example.
E N D
One Way ANOVA using SAS STT 501 Spring 2007
Example • Let’s test to see if mercury level relates to surface area as defined by the following 4 classes: low-<75='Less than 75 acres' 75-<250='75-250 acres' 250-<625='250-625 acres' 625-high='More than 625 acres'
Example • My ANOVA table should look something like: SourcedfSum of Squares S. Area Error Total • Let’s see which of these we can fill in.
Totals • I can get some info for the total line from: One less than the number of observations will be our total degrees of freedom Which is our total sum of squares This requests corrected sums of squares
Example • My ANOVA table should look something like: SourcedfSum of Squares S. Area Error Total 119 13.2169 • OK, we can get error as well.
Error • We can get corrected sums of squares in each group and pool them: Should get 12.4578 Take one less than each of these and add them to get error d.f. 29+30+ 27+30= 116 The sum of these is my error sum of squares
Example • My ANOVA table should look something like: SourcedfSum of Squares S. Area Error 116 12.4578 Total 119 13.2169 • Now get the rest by subtraction...
Example • My ANOVA table should look something like: SourcedfSum of Squares S. Area 3 0.7591 Error 116 12.4578 Total 119 13.2169 • And we should be able to construct the mean squares and the F-test
ANOVA Procedures • There are several procedures in SAS that can conduct the ANOVA, we’ll use GLM (General Linear Model). • To get the analysis of mercury level vs. surface area category:
ANOVA Procedures Identifies the group variable Required, since we’ve constructed groups with a format
ANOVA Procedures The model is always of the form: response = group(s)
ANOVA Procedures Note that SAS is a bit redundant here In multi-factor cases, the model sum of squares will be a sum of all the factors, which are then separated at the bottom. Since surface area is the only factor here, its sum of squares is the same as the model. This should be the same as our ANOVA table
Multiple Comparisons • From the results of our ANOVA, we see moderately significant evidence that mercury level is related to surface area category. • At this point, it would likely be useful to compare mercury levels across surface area categories.
Multiple Comparisons • We can get means for the response variable in each group using the means statement, and we can ask for comparisons as well. • Several multiple comparisons are available, including: Bonferroni adjusted t-tests, Tukey’s W, Ryan’s Q and Dunnett’s test.
Example Starred (***) comparisons are significant. Note that these 2 are actually 1, it’s a bit redundant in its listing Check this note, Bonferroni is not best for all pair-wise comparisons Requests comparisons based on the Bonferroni adjustment Sets the experiment-wise error rate
Example I can request Tukey’s method, the output is of a similar form (with similar results for this case).
Example I also get an interesting note here. If I include the lines option (I can do this with bon as well), I get a bit different form of output, where groups that are not significantly different are marked.
Example For unequal group sizes, it will use an approximation based on the harmonic mean of the group sizes. In fact, this is used when the lines option is specified. This approx. should be avoided if group sizes are very different: largest/smallest > 1.5 Ryan’s Q will always give output in a grouping form. This is because Ryan’s Q is designed for equal group sizes. We can request Ryan’s Q
Comparison with a Control • Dunnett’s test for comparing each group to a control (or a predetermined group with every other group) is available. • We’ll use the birth weight data to check for differences in birth weight versus smoking status, taking non-smokers as a control.
Example How did it know that 0 was the control? Well, of course, it didn’t know that. By default the first value (in alphabetical or numeric order) is taken as the control. Requesting the Dunnett test here gets a comparison of 0 (non-smoker) to every other.
Example I get comparisons to current smokers. Not what I want… but currently smoke is first alphabetically. If I use this format…
Example I can specify the control category. NOTE: it is case sensitive, so make sure you match it exactly.