470 likes | 701 Views
Chapter 13. Comparing Two Populations: Independent Samples. Comparing more than 1 group. Often psychologists are interested in comparing treatments, procedures, or conditions Which drug is better in treating depression, Prozac or Zoloft?
E N D
Chapter 13 Comparing Two Populations: Independent Samples
Comparing more than 1 group • Often psychologists are interested in comparing treatments, procedures, or conditions • Which drug is better in treating depression, Prozac or Zoloft? • Is the whole-language approach to teaching reading more effective than traditional methods?
A Research Study • We are interested in the treatment of major depression • Compare two drug therapies, Prozac and Zoloft • Randomly select 16 people with major depression, • 8 receive Prozac, 8 receive Zoloft
Measuring Depression • Beck Depression Inventory (BDI) developed by Aaron Beckand his colleagues • An “inventory” is a series of questions that are answered by the patient and the patient’s doctor • Each answer contributes to an overall score • That score is a “measure” of depression
Scores on the BDI • Prozac Group 37 33 41 37 48 40 31 37 • Zoloft Group 36 39 44 49 41 48 44 35
Hypothesis test of Prozac vs. Zoloft • 1. State and Check Assumptions • Normally distributed? - don’t know • σ?– don’t know • Interval data ? - probably • Independent Random sample? - yes
Hypothesis test of Prozac vs. Zoloft • 2. Hypotheses HO: μ1 = μ2 (the effectiveness Prozac and Zoloft are the same) μ1- μ2= 0 (the difference between the effectiveness of Prozac and Zoloft is 0) HA : μ1 ≠ μ2 (the effectiveness of Prozac and Zoloft are not equal) μ1- μ2 ≠ 0 (there is a difference between the effectiveness Prozac and Zoloft)
Hypothesis test of Prozac vs. Zoloft • 3. Choose test statistic • parameter of interest - μ • 2 groups • independent samples • Not sure about Normal Distribution • Don’t know Population Standard Deviation
Hmm… • What do we know about μ1– μ2? • What do we know about M1 – M2? • Since we don’t know μ1or μ2, we’ll concentrate on M1 – M2
Sampling Distribution • The sampling distribution of M1 – M2would help us predict values from random samples • Three facts: • 1. The mean of the M1 – M2sampling distribution is equal to the mean of the sampling distribution of μ1– μ2 • 2. When the 2 populations have the same variance, then the standard deviation of the sampling distribution is • 3. CLT
So… • If we knew σ, we could transform the statistic M1 – M2to a z score and use table A, but • We don’t know σ • But we know s1and s2, that is, the standard deviations of the two samples • Can we use them?
NO • Not with a z, • But we can use a t distribution • That is to say: the differences in sample means, divided by the estimated SEM, is distributed as a t
Sampling Distribution • The sampling distribution of M1 – M2would help us predict values from random samples • Three facts: • 1. The mean of the M1 – M2sampling distribution is equal to the mean of the sampling distribution of μ1– μ2 • 2. When the 2 populations have the same variance, then the standard deviation of the sampling distribution is • 3. CLT
Hypothesis test of Prozac vs. Zoloft • 1. State and Check Assumptions • Normally distributed? - don’t know • σ?– don’t know • Interval data ? - probably • Independent Random sample? – yes • Homogeneity of Variance (HoV): are the variances of the two population equal? – don’t know, but we’ll assume they are (can we check this out?)
More on the estimated SEM • s2pis called “pooled variance” • it is the variance of the two samples, put together, or pooled • s21(n1-1) looks familiar, doesn’t it? • (it’s variance times n-1)
SS(X1), right? • s21(n1-1) = SS(X1) • Thus:
df in a 2-sample t-test • Since the calculation of each mean has n -1 degrees of freedom, then • The 2-sample t-test has (n1 -1) + (n2 -1)df, or • df = n1 + n2 - 2
estimated SEM, again • So, when we left the est SEM, we had: • But, n1 + n2– 2 =df, right? Thus:
Back to the hypothesis test • 4. Set Significance Level α= .05 Critical Value Non-directional Hypothesis with df = n1 + n2 - 2 = 8 + 8 - 2 = 14 From Table C tcrit= 2.145, so we reject HO if t≤ - 2.145 or t≥ 2.145
Hypothesis test of Prozac vs. Zoloft • 5. Compute Statistic • We need:
Scores on the BDI • Prozac Group 37 33 41 37 48 40 31 37 • Zoloft Group 36 39 44 49 41 48 44 35
Hypothesis test of Prozac vs. Zoloft • 6. Draw Conclusions • because our t does not fall within the rejection region, we cannot reject the HO, and • conclude that we did not find any evidence that Prozac and Zoloft are different in their effectiveness to treat depression
What if? • What if we have unequal sample sizes?
Unequal Sample Sizes • In the previous example, n1 = n2 = 8, but • What if n1≠ n2? • In this case we make an adjustment to the calculation of the SEM • But, since we calculate the pooled variance (a weighted mean), we’re OK
Just so we’re on the same page • If n1 is larger than n2, then n1 - 1 will be larger than n2 - 1 This is larger than that
So… • If n1 is larger than n2, then s12 (n1 - 1) will be weighted more than s22(n2 - 1) This is weighted more than that
This makes sense • If we make the homogeneity of variance assumption (the sampled populations have the same variance), then • The best estimate of the population standard deviation will use information from both samples, • But when we have more observations in one sample than the other, than we have more information from that sample than the other • We should use that additional information, which is precisely what weighting accomplishes
Effect size estimates • After conducting a t-test, you should report: • t • df • p • But, it is becoming a standard practice to report effect size as well (Cohen’s d is a good measure)
Effect Size review • Effect size – the strength of the relationship (between IV and DV) in the population, or, the degree of departure from the null hypothesis • Important points: • rejecting the null hypothesis doesn’t imply a large effect, and • failing to reject the null does not mean a small effect
Example (from Rosenthal and Rosnow, 1991 – a great book on research methodology) • Smith conducts an experiment with 40 learning disabled children • half undergo special training (“experimental group”) and • half receive no special training (“control group”) • She reports that the experimental group improved more than the control group (p < .05)
But • Jones is skeptical about Smith’s results and attempts to repeat (replicate) the experiment with 20 children, • half in the experimental and • half in the control group • He reports a p > .10, and claims that Smith’s results are not-replicable
As you can see • Even though Jones did not reject the null hypothesis, he had the same effect size as Smith • Jones lacked power (but Smith had pretty low power as well)
Statistic = effect size X size of study • Effect Size Size of Study
What if one or more of the assumptions are violated? • Gross, meaning large, violations may cause the real α to be different from the stated significance level • Gross violations of the normality and H of V assumptions will cause these problems with a t-test
Alternative Test • When gross violations of the assumptions of normality or variance with a 2-independent samples t-test becomes apparent, • Use a Rank Sum T test
Rank Sum T test • Rank all the scores (across both groups) • Sum the ranks of each group (T= the sum of the ranks of group 1) Turns out that the T sampling distribution is approximately normal
When to use Rank Sum T • Turns out, the t-test is fairly ROBUST to violations of HoV. • But not large violations… • What is a large violation of HoV? • Recommendation: greater than 10x, use Rank Sum…