480 likes | 495 Views
Learn to use ANOVA in SPSS to analyze data, compare group means, conduct post hoc tests, interpret output, and make statistical inferences. Understand assumptions and implications of ANOVA results.
E N D
ANOVA (using SPSS)Mathematics & Statistics HelpUniversity of Sheffield
Learning outcomes • By the end of this session you should: • Understand when to use an analysis of variance • Be able to carry out a one way analysis of variance in SPSS and interpret the output • Be able to conduct a post hoc test to compare differences between groups and interpret the output • Be able to carry out a two way analysis of variance in SPSS and interpret the output • Be able to investigate whether there is an interaction between two categorical explanatory variables in a two way analysis of variance • Be aware of assumptions needed for the analysis of variance model to be valid
Download the data In your web browser, type in the following address and save the files to your computer: http://www.sheffield.ac.uk/mash/workshop_materials
ANOVA: Analysis of Variance • Independent samples t-test compares the means of two groups • ANOVA is the generalisation of this to more than two groups • Example: Which diet is best? • Dependent: Weight lost (Scale) • Independent: Diet 1, 2 or 3 (Nominal) • Null hypothesis: The mean weight lost on diets 1, 2 and 3 is the same:
How does ANOVA work? Let’s think of the overall variability in a set of data as being split into two sources: Between groups: due to the group means differing from the overall mean (calculated irrespective of and that due to the Within group: due to differing from the individual group mean Between groups variation: Within groups variation:
How does ANOVA work? If the variation between groups is greater than the variation within groups this suggests that the groups are different We do this using the ratio of the two measures of variability. This is called the F statistic and the test is the F test! F = between groups variation within groups variation If F > 1, more variation between groups than within groups, suggesting a difference between groups
Reminder: Hypothesis testing steps • State the null and alternative hypotheses • Decide on the appropriate test • Collect data and undertake analysis • Calculate a test statistic • Calculate the p-value • Compare the p-value with 0.05 • If p < 0.05, reject the null • Conclude
Which diet is best? Open the dataset ‘Diet’ in SPSS Diet 1, 2 or 3 Females = 0 Weight before Weight after
Exercise 1 • Use Transform Compute variable to calculate weight lost by each person • Calculate the overall mean weight lost • Calculate the means and standard deviation by group and complete the table on the next slide (hint: can use Analyse Explore to obtain stats) • Which diet resulted in the greatest weight lost and are the group standard deviations similar? • Use Graphs Legacy Dialogs Box-plot to produce a boxplot of weight lost by diet
Exercise 1: Summary statistics • Fill in the table • Which diet was best? • Are the standard deviations similar?
Box-plot The box-plot shows the distribution of weight lost for each group and allows you to compare between groups
ANOVA in SPSS ANALYZE General Linear Model Univariate Click on options to obtain group means
One way ANOVA output P-value Test Statistic
Exercise 2: Discuss the results Discuss the results and how you would interpret the table. Is there a difference between the groups?
Post hoc tests • If there is a significant ANOVA result, pairwise comparisons can be made • They are t-tests with adjustments to control the overall type 1 error rate (reject the null when true): • Tukey’s and Scheffe’s tests are the most commonly used post hoc tests • Hochberg’s GT2 is better where the sample sizes for the groups are very different • If one group is a control group that you are comparing all others to, use Dunnett’s
Post hoc tests in SPSS Select ‘Post hoc’ to choose the post hoc tests
Post hoc tests • Move ‘Diet’ across to the right hand box • Select Tukey from the equal variances selection
Exercise 3: What are the significant differences between diets? Write up the results and conclude with which diet is the best
Exercise 3: Pairwise comparisons Results: Report:
What are residuals? • Residuals are the differences between the group mean and each subject
Normally distributed data Data only need to be approximately normally distributed The distribution should peak roughly in the middle and be approximately symmetrical This is an example of data which are very skewed. If your residuals look like this you SHOULD NOT use ANOVA. Use Kruskall Wallis instead
Statistical tests for normality • There are official tests for normality such as the Shapiro-Wilk and Kolmogorov-Smirnoff tests • If p > 0.05, normality can be assumed • Use them with caution: • For small sample sizes (n < 20), the tests are unlikely to detect non-normality • For larger sample sizes (n > 50), the tests can be too sensitive • Very sensitive to outliers • Advice: Use histograms, box-plots, comparison of means and medians for assessing normality
Homogeneity of variance • Variance = (standard deviation)2 • As a rough guide any 1 standard deviation should not be more than twice another • If Levene’s p-value > 0.05, equal variances can be assumed
Checking assumptions: normality Re-run the ANOVA with a couple of extra steps: • Click on the Save button: • Request ‘Standardized’residuals. This will produce an extra column in the dataset (one residual for each subject):
Checking assumptions: normality Graphs Legacy Dialogs Histogram
Checking assumptions: homogeneity of variance Re-run the ANOVA with a couple of extra steps: • Click on the options button: • Tick the homogeneity tests button:
Exercise 4: Can normality be assumed? From your histogram of the standardised residuals can normality be assumed? Should you: Use ANOVA Use Kruskall-Wallis
Exercise 4: Use Levene’s test to examine whether equal variances can be assumed? Null: p= Decision: Reject / do not reject Conclusion:
Reporting ANOVA A one-way ANOVA was conducted to compare the effectiveness of three diets. Normality checks and Levene’s test were carried out and the assumptions were met. There was a significant difference in mean weight lost [F(2,75)=6.197, p = 0.003] between the diets. Participants lost weight on all diets. The mean weight lost on diets 1 and 2 were similar (3.3 kg and 3 kg respectively) but the weight loss was more effective for diet 3 (5.15kg) compared to either diets 1 or 2. Post hoc comparisons using the Tukey HSD test were carried out. There was no significant difference between diets 1 and 2 but there was between diet 3 and diet 1 (p = 0.02) and diet 2 and diet 3 (p = 0.005).
Two-way ANOVA • Just completed a one-way ANOVA but can extend it to classifying by Dependent = Weight Lost. Known as two-way analysis of variance • Independents: Diet and Gender • Tests 3 hypotheses: • Mean weight loss does not differ by diet • Mean weight loss does not differ by gender • There is no interaction between diet and gender What’s an interaction?
Means plot: reaction times after different drinks, by gender Mean reaction times after consuming coffee, water and beer were taken and the results by drink or gender were compared
Means/ line/ interaction plot No interaction between gender and drink. Lines are approximately parrallel Mean reaction time for men after water = 15 Mean reaction time for women after drinking coffee = 6
Means/ line/ interaction plot Interaction between gender and drink Mean reaction time for men after coffee = 23 Mean reaction time for women after drinking coffee = 12
Means plot in SPSS Graphs Legacy Dialogs Line • Select the ‘Multiple’ option Select the lines represent ‘other statistic’ category and move the independent variable across Move the two categorical independent variables to the ‘Category axis’ and ‘Define lines by’ boxes. The x-axis will be the category axis option
Exercise 5: Interaction Is there an interaction between gender and diet?
Two-way ANOVA in SPSS Analyse General Linear Model Univariate Move Gender into the Fixed Factors box with Diet
Exercise 6: Two way ANOVA with interaction • Run a two way ANOVA for gender and diet. Don’t forget to click on the Options box and request the estimated marginal means for Diet and Gender • Check the assumptions • Levene’s test for homogeneity of variance / look at the SDs within each group • Save the residuals and plot them – does they look normally distributed • Are the main effects of gender and diet significant? • Is the interaction between the two significant?
Main effect of diet The three group averages (red lines) are compared to the overall average for everyone (grey line)
What if there is a significant interaction? • The main effects need to be discussed by group e.g. for males/ females separately • The best way to describe what is happening is by using the means plot • Separate ANOVA’s can be carried out by group e.g. testing diet by gender
Splitting the file by group • To produce output separately by group Data Split file • Once ‘split file’ is activated, it will produce all output by group until you tell it not to! • Only do this if you have a significant interaction
Exercise 7: ANOVA by gender • Split the file by group as described on the previous slide • Run the ANOVA again (removing gender from the Fixed Factor list) • Is there a diet effect for males and/ or females? • If there is, what is it and which diets are different?
Exercise 7: Post hoc tests and reporting results If the ANOVA is significant, produce suitable post hoc tests and summarise differences using summary statistics by diet/ gender
Learning outcomes You should now: • Understand when to use an analysis of variance • Be able to carry out a one way analysis of variance in SPSS and interpret the output • Be able to conduct a post hoc test to compare differences between groups and interpret the output • Be able to carry out a two way analysis of variance in SPSS and interpret the output • Be able to investigate whether there is an interaction between two categorical explanatory variables in a two way analysis of variance • Be aware of assumptions needed for the analysis of variance model to be valid
Maths And Statistics Help Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm) http://www.sheffield.ac.uk/mash
Resources All resources are available in paper form at MASH
Contacts Follow MASH on twitter: @mash_uos