180 likes | 497 Views
Sociology 601 Class 9: September 29, 2009. 7.2: Difference between two large sample proportions. 7.3: Small sample comparisons for two independent groups. Difference between two small sample means Difference between two small sample proportions Stata practice.
E N D
Sociology 601 Class 9: September 29, 2009 • 7.2: Difference between two large sample proportions. • 7.3: Small sample comparisons for two independent groups. • Difference between two small sample means • Difference between two small sample proportions • Stata practice
7.3: small sample comparison -Treatments for depression A clinical psychologist wants to choose between two therapies for treating severe cases of mental depression. • Therapy A: existing therapy, 6 subjects • improvement scores: +10 +10 +20 +20 +30 +30 • Ybar = 20 • standard deviation = √(Σ(Yi – Ybar)2 / (n-1)) = √80 = 8.944 • Therapy B: new therapy, 3 subjects • improvement scores: +30 +30 +45 • Ybar = 35 • standard deviation = √(Σ(Yi – Ybar)2 / (n-1)) = √75 = 8.660
7.3 Interactive STATA command for ttesti *an immediate test for the depression exercise in lecture 7.3 *n1, mean1, s.d.1, n2, mean2, s.d.2 ttesti 6 20 8.944 3 35 8.660 Two-sample t test with equal variances ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 6 20 3.651373 8.944 10.61385 29.38615 y | 3 35 4.999853 8.66 13.48737 56.51263 ---------+-------------------------------------------------------------------- combined | 9 25 3.726718 11.18015 16.40617 33.59383 ---------+-------------------------------------------------------------------- diff | -15 6.267643 -29.82062 -.1793794 ------------------------------------------------------------------------------ Degrees of freedom: 7 Ho: mean(x) - mean(y) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -2.3932 t = -2.3932 t = -2.3932 P < t = 0.0240 P > |t| = 0.0479 P > t = 0.9760
7.3 STATA command for ttest, using a data set . * next, perform the t-test . * ttest for two means using a stored data set . ttest impscore, by(treat) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 1 | 6 20 3.651484 8.944272 10.61356 29.38644 2 | 3 35 5 8.660254 13.48674 56.51326 ---------+-------------------------------------------------------------------- combined | 9 25 3.72678 11.18034 16.40603 33.59397 ---------+-------------------------------------------------------------------- diff | -15 6.267832 -29.82107 -.1789331 ------------------------------------------------------------------------------ Degrees of freedom: 7 Ho: mean(1) - mean(2) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -2.3932 t = -2.3932 t = -2.3932 P < t = 0.0240 P > |t| = 0.0479 P > t = 0.9760
Significance test for treatment of depression:1. Assumptions • samples chosen at random from their respective populations • variables have an interval scale (the difference between 10 and 20 is the same as the difference between 20 and 30) • the underlying populations are normally distributed • (How bad is it if these populations are not normally distributed?) • The standard deviations of the two populations are the same (!!)
Significance test for treatment of depression:2. Hypothesis • The null hypothesis is that there is no difference between the treatment effects across the population, so any differences between samples are due to random error. • H0: μ2 – μ1 = 0
Significance testfor treatment of depression:3. Test Statistic • Test statistic: • To solve this equation, we need an equation for the standard error of the difference between two sample means. There are two alternatives: • Do not assume that the populations have equal variances. (Consistent with a strict interpretation of Ho, but sometimes one of the small samples turns out to have a tiny or huge sample standard deviation) • Assume that the populations have equal variances. (Not really part of Ho, but produces more stable results.)
Back to the test statistic • This is how you would do a t-test for comparing small sample means, assuming equal variance.
Significance test for treatment of depression:4. P value • t = 2.394, df = 7. • Table on p. 669, use the row for df = 7. • Move across columns to the t-score below and above 2.394. These are the third and fourth columns. • Find the one-sided p-value (in subscripts) for the bracketing t-scores. I determine .025 > p > .01 for a one-sided test. • In this case, we want a two-sided test, so we double the p-value. .05 > p > .02 for a two-sided test.
Significance test for treatment of depression:5. Conclusion • p < .05 for a two-sided test. • Because a difference this great would occur less than 5% of the time purely by chance, we reject the null hypothesis and conclude that the populations do not have the same mean. • In practical terms, the improvement scores average 15 points higher under treatment B. We don’t know the metric of the improvement scores, but a 15 point difference is almost as large as the total improvement score for treatment A, and is larger than the standard deviation for improvement scores.
Confidence interval for a small sample comparison • As expected, the 95% confidence interval does not include 0
Small-sample inference: comparison of population proportions With a small-sample comparison of population proportions, we are in the same fix that we were in for a test of a single population proportion. • With a categorical outcome, we cannot assume that the population has a normal distribution. • With a small sample size, the central limit theorem cannot assure us that the sampling distribution is normal. • Our only option with small sample proportions is to painstakingly calculate the probability of each outcome by hand (or have STATA do it).
Small sample inference: Comparing population proportions • A recent study compared adults who had been raised as children in lesbian families with adults who had been raised by heterosexual mothers. • In a sample from 20 heterosexual mothers, 4 adult children reported ever having a same-sex sexual attraction, and 16 did not. • In a sample from 25 lesbian mothers, 9 adult children reported ever having a same-sex sexual attraction, and 16 did not. • Are children of lesbian mothers more or less likely than children of other mothers to report a same-sex sexual attraction?
STATA command for Fisher’s Exact Test using TABI . * immediate Fisher’s exact test for population proportions . tabi 4 16 \ 9 16 | col row | 1 2 | Total -----------+----------------------+---------- 1 | 4 16 | 20 2 | 9 16 | 25 -----------+----------------------+---------- Total | 13 32 | 45 Fisher's exact = 0.327 1-sided Fisher's exact = 0.200
STATA command for Fisher’s Exact Test using an existing data set * Fisher's exact test using a data set . tabulate lbimom attract, exact | attract lbimom | 0 1 | Total -----------+----------------------+---------- 0 | 16 4 | 20 1 | 16 9 | 25 -----------+----------------------+---------- Total | 32 13 | 45 Fisher's exact = 0.327 1-sided Fisher's exact = 0.200
Hypothesis test using Fisher’s exact test: • Assumptions: We assume that the observations are taken from random samples, and that the mothers and their adult children fall into one category or the other, but not both. • Hypothesis: There is no difference in the proportions of adult children who report a same-sex attraction, based on the sexual orientation of the mother. H0: 2 - 1 = 0 • Test statistic: none • P-value: The null as stated produces a 2-tailed p-value of 0.327. (If we had stated a one-sided hypothesis: “Children of lesbian mothers are no more likely to have a same sex attraction than children of other mothers”, then p =.200.) • Conclusion: do not reject the null hypothesis that the two populations have the same proportion reporting a same sex attraction.
Formula for Fisher’s exact test: • This formula is not for any homework or test, but it may help you understand what is happening. • Given the following table: Mother attract no attract TOTAL B/L mom 9 16 R1 = 25 No B/L mom 4 16 R2 = 20 TOTAL C1 = 13 C2 = 32 n = 45 • The “null” probability for the underlined cell is 11=R1C1/ n • With plenty of algebra, the binomial expansion solves to… • Pr(O11=9, O12=16, O21=4, O22=16) • = (R1! R2! C1! C2! / n!*9!*16!*4!*16!) = .1356 • Calculate this equation for all possible 2X2 distributions based on the observed row and column totals. (In this case, there are 13 possible.) • For a two-tailed test, sum all probabilities at least as unlikely as the observed probability.