480 likes | 603 Views
Exploring Group Differences. Before Break:. 1) Descriptive Statistics: Measures of central tendency Measures of variability Z-scores 2) Understanding statistical significance Hypothesis testing Alpha and p-values 3) Testing for relationships/associations between variables:
E N D
Before Break: • 1) Descriptive Statistics: • Measures of central tendency • Measures of variability • Z-scores • 2) Understanding statistical significance • Hypothesis testing • Alpha and p-values • 3) Testing for relationships/associations between variables: • Correlation (Pearson’s r) • Simple regression • Multiple regression
After Break: • 1) Testing for Group differences • T-tests • ANOVA • 2) Understanding statistical significance • Effect Size • Power • 3) Nonparametric statistics and ‘other’ common tests • Chi-square • Logistic regression If you have a firm grip on pre-break material – the second half of the course becomes much easier (in my opinion)
Quick review • I have a dataset that contains information on fitness and academic performance in middle-school children • I want to know if fitness is related to academic success • I’m going to use PACER laps to quantify fitness and ISAT science scores to quantify academic success • I can answer this question in various ways – let’s start with measures of association (correlations) • What would be my null and alternative hypothesis using a correlation?
Results • What is the relationship between aerobic fitness and science ISAT results? • p = 0.009, what does this mean? • Low chance of random sampling error • We would only see a correlation this strong, or stronger, 9 times out of 1000 due to random sampling error (due to chance)
Association • Association (and prediction) statistics like correlation and regression are useful, but can be limited • The other “half” of statistical testing is centered around determining group differences • For example, we could ask our fitness/academics question a different way and use a different set of statistics • Also useful in experiments (treatment vs control), comparing genders (males vs females), etc…
Example • Imagine I use PACER laps to split kids into two different groups • High Fitness (high number of laps) • Low Fitness (low number of laps) • NOTE: I took a continuous variable and made it into a categorical variable (nominal/ordinal) • Now I can ask the question a different way • What are my null and alternative hypotheses? • Remember, I believe that fitness is related to academic success
Example cont • HO: There is no difference in science scores between the high fitness and low fitness group • Notice, no difference would mean fitness has no effect • HA: There is a difference in science scores between the high fitness and low fitness group • A difference would indicate that fitness has some effect • This is simple enough – we know how to calculate and compare means in SPSS…
High vs Low Fitness: Mean • Conclusion…? Should I reject the null hypothesis? • Wait – could this difference be due to random sampling error?
Need for new statistical test • Is this difference due to random sampling error? • Due to the effect of random sampling, the two groups will NEVER have the exact same science scores • I need a way to determine if this difference is REAL or due to RSE • I need to use a statistical test that can determine group differences and provide me with a p-value
T-test • A t-test is a family of statistical tests designed to determine if differences exist between two groups (and ONLY two groups) • Based on t-scores (which are very similar to z-scores), should tip you off they are based on mean and SD • They test for ‘equality of means’ • If the two group means are equal – then there is no difference • 3 major types of t-tests • One sample t-test, independent samples t-test, paired-samples t-test
T-tests • One-sample t-test = • Compares mean of a single sample to known population mean • i.e., group of 100 people took IQ test, are they different from the population average? Do they have above average IQ? • Independent samples t-test = • Compares the means scores of two different groups of subjects • i.e., are science scores different between high fitness and low fitness • Paired-samples t-test = • Compares the mean scores for the same group of subjects on two different occasions • i.e., is the group different before and after a treatment? • Also called a dependent t-test or a repeated measures t-test • In all cases TWO group means are being compared
Independent Samples T-Test • Let’s start here, since we need to use this test for our fitness/science question • Independent Samples T-tests: • Used with a two-level, categorical, independent variable (High/Low Fitness) – ONLY two groups… • and with one continuous dependent variable (science ISAT scores) • Statistical assumptions – 1) data are normally distributed, 2) samples represent the population, 3) the variance of the two groups are similar (homoscedasticity of variance) NOTE: Same as correlation/regression – except we no longer have to worry about a ‘linear’ relationship since one of our variables is categorical (high/low fitness)
SPSS: Data format • In SPSS, the science scores are my continuous, dependent variable • I created the ‘high’ and ‘low’ fitness groups based on how many PACER laps each child completed • When I created them, I coded ‘high fitness’ as 0 and ‘low fitness’ as 1 • You need to recognize how your data are coded for a t-test
SPSS – T-test • Move dependent variable to “Test Variable” • Move your independent variable to “Grouping Variable” • Notice, it now has 2 question marks • SPSS needs to know which groups to compare • “Define Groups”
SPSS – T-test • Recall, ‘high fitness’ was 0, ‘low fitness’ was 1 • Manually enter these values into the box • When done, hit “continue’, then ‘ok’
T-test results • The first box will contain what you’ve already seen – the mean of the two groups: • Notice, n, mean, standard deviation (ignore SE) for each group • The next box is too big for one screen, so I’ve split it into two pieces…
SPSS results - Output • Recall, both groups need to have equal variance (homogeneity of variance, or homoscedasticity) • SPSS tests for this using “Levene’s Test” • Null hypothesis = There is equal variance • This means you do NOT want a p-value < 0.05
SPSS results - Output • If this Levene’s Test p-value is > 0.05 • Equal variances exist, use the top line of the table • If this Leven’s Test p-value is < or = 0.05 • Equal variance does not exist, use the bottom line • Becomes harder to find a statistically significant result
df – Degrees of Freedom • The table also shows df, or ‘degrees of freedom’ • df is used to calculate the t-score and p-value for the t-test • df = n – 1 • For each group you have, subtract 1 from the sample • We have two groups: • High Fitness, n = 176, so 176 – 1 = 175 • Low Fitness, n = 98, so 98 – 1 = 97 • Total n = 176 + 98 = 274, we have 2 groups so… • Total df = 274 – 2 = 272
Degrees of Freedom • df is important to understand if you are calculating the p-values by hand – we are NOT • All you need to know now is that: • Larger sample size = ↑ df • More groups = ↓ df • You want large df because it reduces your chance of random sampling error (a large sample) and increases the chance you’ll find statistically significant results • This becomes more important beyond t-tests, since we can have several groups (not just 2)
df in our example • Notice, the df in our example is 272 (274 subjects minus our two groups (high and low fitness) • If you do not have equal variances, SPSS ‘downgrades’ your df, making it more difficult to find statistically significant results
Before we move on… • Questions about equality of variance test? df? • Remember, we’re trying to determine if the difference between the two groups is real – or due to RSE • What we know so far: • And, the two groups do have equal variance
More results • Here is the important stuff (remember, using top line): • Our two groups (high/low fitness) had a mean difference of 12.2 on the science ISAT • 239.1 – 226.9 = 12.2 • This difference is statistically significant, p = 0.001
Decisions Questions about t-test results? • HO: There is no difference in science scores between the high fitness and low fitness group • HA: There is a difference in science scores between the high fitness and low fitness group • Decision? • Results: The high fitness group scored higher than the low fitness group on their science ISAT test by 12.2 points. This difference was statistically significant, t (272) 3.262, p = 0.001. • Usually report the t value of the test and the degrees of freedom in the paper (from table)
One more thing… • Notice in the t-test table that we also were provided with a 95% confidence interval: • 95% confidence intervals are a statistic available from most tests, and are related to p-values. • Lower Bound = 4.8, upper bound = 19.5
95% confidence intervals • Confidence intervals are similar to p-values • Remember, p-values indicate probability of random sampling error • We want low p-values, which indicate a low probability of random sampling error • We most often use a p-value cutoff of 0.05, meaning we like to be 0.95 (or 95% confident) that this was NOT due to random sampling error • Confidence intervals give you a similar type of information, but in a more practical sense • Many people prefer confidence intervals over p-values
95% confidence intervals • Remember, in statistics we are using samples to try and figure out information about the population • When we calculate a mean for a sample, we are really trying to understand what the REAL population mean is • But, due to random sampling error, we always know that our sample mean is different from the real population mean • Example – mean IQ score for all 7 billion humans is 100 • Sample 1 of 100 humans = 102.1, Sample 2 = 105.3, Sample 3 = 98.2, etc… • Random sampling error
Pretend we keep on drawing more and more samples until we got 100 different samples and 100 different lines on this chart IQ 1 3 2 If we did that, would there be a pattern to where the lines were drawn? Sample 1 Mean = 102.1 Sample 2 Mean = 105.3 Sample 3 Mean = 98.2 Would ALL the lines be so close to the population mean of 100? X = 100 SD = 15 145 55 70 85 100 115 130 Distribution of IQ scores from the entire population
However, a 95% confidence interval would tell you where 95% of the 100 lines fell Not all samples will be close to 100, just due to random sampling error X = 100 SD = 15 95% Confidence Interval 145 55 70 85 100 115 130 Distribution of IQ scores from the entire population
But notice, the ‘more confident’ we want to be, the wider the gap gets Could also make a 99% confidence interval if we wanted to Usually, people stick with a 95% confidence interval (since we usually use a p-value of 0.05) 99% Confidence Interval 145 55 70 85 100 115 130 Distribution of IQ scores from the entire population
In this example, a 95% confidence interval indicates that we are 95% certain that the REAL population mean falls between these two values X = 100 SD = 15 We can use a 95% confidence interval for virtually any population parameter we want to – such as a correlation coefficient, a regression slope, or a mean difference between two groups (like with our t-test) 95% Confidence Interval 145 55 70 85 100 115 130 Distribution of IQ scores from the entire population
Back to our t-test • Our 95% confidence interval: • Notice is says, “Interval of the Difference” • We are 95% certain that the real difference is AT LEAST as big as 4.8 and might be up to 19.5 points between our High and Low Fitness Groups • Our confidence level can never be 100%, so there is always a chance the real population difference is outside of this range (just like p can never be 0)
Confidence Intervals and p-values • These two values are connected because: • Both related to RSE • Both calculated using n (and df) • A low p-value (low chance of random sampling error) will result in a smaller (more narrow) confidence interval – we can be more confident • A larger p-value will result in a wider confidence interval – we are less confident Questions on confidence intervals?
One more example t-test • Instead of using Aerobic fitness, let’s use flexibility • I split my sample into High Flexibility and Low Flexibility groups (based on sit and reach test) • Now, I’ll run a t-test to see if the High Flexibility kids score higher on their science tests than the Low Flexibility kids
What are my hypotheses? • HO: There is no difference in science scores between the high flexibility and low flexibility group • HA: There is a difference in science scores between the high flexibility and low flexibility group
T-test results: Flexibility • We can see that the high flexibility group has a higher Science ISAT score by about 5, but is this difference statistically significant???
T-test results: Flexibility • Levene’s Test p = 0.521 • What does this mean? • df = 285 • What is our sample size?
T-test results • Notice the mean difference (difference between High/Low groups) and the 95% confidence interval • I have intentionally removed the p-value for this t-test • Is there a statistically significant difference between the two groups? Is the p-value < or = 0.05?
T-test results • Remember, to reject the null hypothesis we have to be reasonably certain that the two groups are different • If this was the case, the difference between the two groups could NOT be 0 • If the mean difference is 0, the two groups are identical • When the 95% confidence interval INCLUDES 0, we can’t be 95% certain that there is a group difference – and therefore, p is > 0.05
T-test results • Our 95% confidence interval includes 0 (one number is negative and the other is positive) • Therefore, we can’t be 95% certain the real group difference is NOT 0 • p = 0.311, we can’t be sure this is not due to RSE
95% CI and p • If your 95% CI includes 0, your p-value will NOT be less than or equal to 0.05 • Because both statistics are evaluating the chance of RSE • If your 95% CI does not include 0 (both numbers are positive or both are negative), then we can be confident that the two groups are not the same • This means that p < or = 0.05 Questions…?
Upcoming… • In-class activity • Homework: • Cronk re-read 6.1, complete 6.3 (skip 6.2 for now) • Holcomb Exercises 35 and 37, 38, 39 • More t-tests next week • Single sample t-test • Paired samples t-test (repeated measures t-test)
Example In-Class, 10 minutes • Go to Blackboard and open the SPSS dataset • ‘Fitness and Academics Reduced’ (week 7) • Run two different independent samples t-tests • Determine if kids who are aerobically fit (using PACER) score higher in reading or math than kids who have low fitness • Write down your results in this format (x2): • T = XX, df = XX, p = XX • Mean difference = XX, 95%CI (XX to XX)
Results of two t-tests • Reading (equal variances NOT assumed): • t = 4.856, df = 411.2, p < 0.0005 • Mean difference = 9.8, 95%CI (5.9 to 13.8) • Math (equal variances assumed): • t = 4.021, df = 837, p < 0.0005 • Mean difference = 8.8, 95%CI (4.5 to 13.0)