380 likes | 543 Views
Chapter 6. Inferences Regarding Locations of Two Distributions. Comparing 2 Means - Independent Samples. Goal: Compare responses between 2 groups (populations, treatments, conditions)
E N D
Chapter 6 Inferences Regarding Locations of Two Distributions
Comparing 2 Means - Independent Samples • Goal: Compare responses between 2 groups (populations, treatments, conditions) • Observed individuals from the 2 groups are samples from distinct populations (identified by (m1,s1) and (m2,s2)) • Measurements across groups are independent (different individuals in the 2 groups) • Summary statistics obtained from the 2 groups:
Sampling Distribution of • Underlying distributions normal sampling distribution is normal • Underlying distributions nonnormal, but large sample sizes sampling distribution approximately normal • Mean, variance, standard error (Std. Dev. of estimator):
Small-Sample Test for m1-m2Normal Populations • Case 1: Common Variances (s12 = s22 = s2) • Null Hypothesis: • Alternative Hypotheses: • 1-Sided: • 2-Sided: • Test Statistic:(where Sp2 is a “pooled” estimate of s2)
Small-Sample Test for m1-m2Normal Populations • Decision Rule: (Based on t-distribution with n=n1+n2-2 df) • 1-sided alternative • If tobsta,n ==> Conclude m1-m2 > D0 • If tobs < ta,n ==> Do not reject m1-m2 = D0 • 2-sided alternative • If tobsta/2 ,n ==> Conclude m1-m2 > D0 • If tobs -ta/2,n ==> Conclude m1-m2 < D0 • If -ta/2,n < tobs < ta/2,n ==> Do not reject m1-m2 = D0
Small-Sample Test for m1-m2Normal Populations • Observed Significance Level (P-Value) • Special Tables Needed, Printed by Statistical Software Packages • 1-sided alternative • P=P(t tobs) (From the tn distribution) • 2-sided alternative • P=2P(t |tobs| )(From the tn distribution) • If P-Value a, then reject the null hypothesis
Small-Sample (1-a)100% Confidence Interval for m1-m2 - Normal Populations • Confidence Coefficient (1-a) refers to the proportion of times this rule would provide an interval that contains the true parameter value m1-m2 if it were applied over all possible samples • Rule: • Interpretation (at the a significance level): • If interval contains 0, do not reject H0: m1 = m2 • If interval is strictly positive, conclude that m1 > m2 • If interval is strictly negative, conclude that m1 < m2
t-test when Variances are Unequal • Case 2: Population Variances not assumed to be equal (s12s22) • Approximate degrees of freedom • Calculated from a function of sample variances and sample sizes (see formula below) - Satterthwaite’s approximation • Smaller of n1-1 and n2-1 • Estimated standard error and test statistic for testing H0: m1=m2:
Example - Maze Learning (Adults/Children) • Groups: Adults (n1=14) / Children (n2=10) • Outcome: Average # of Errors in Maze Learning Task • Raw Data on next slide • Conduct a 2-sided test of whether true mean scores differ • Construct a 95% Confidence Interval for true difference Source: Gould and Perrin (1916)
Example - Maze LearningCase 1 - Equal Variances H0: m1-m2 = 0 HA: m1-m2 0 (a = 0.05) No significant difference between 2 age groups
Example - Maze LearningCase 2 - Unequal Variances H0: m1-m2 = 0 HA: m1-m2 0 (a = 0.05) No significant difference between 2 age groups Note: Alternative would be to use 9 df (10-1)
Small Sample Test to Compare Two Medians - Nonnormal Populations • Two Independent Samples (Parallel Groups) • Procedure (Wilcoxon Rank-Sum Test): • Null hypothesis: Population Medians are equal H0: M1 = M2 • Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks. • Obtain the rank sum for group with smallest sample size (T ) • 1-sided tests:Conclude HA: M1 > M2 if T > TU • Conclude: HA: M1 < M2 if T < TL • 2-sided tests: Conclude HA: M1M2 if T > TU or T < TL • Values of TL and TU are given in Table 6, p. 683 for various sample sizes and significance levels. • This test is mathematically equivalent to Mann-Whitney U-test
Example - Levocabostine in Renal Patients • 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6) • Outcome: Levocabastine AUC (1 Outlier/Group) • 2-sided Test (a= 0.05): TL=26, TU = 52, T=45 (Group 1) • Conclude Medians differ (M1<M2) if T < 26 • Conclude Medians differ (M1>M2) if T > 52 • Neither criteria are met, do not conclude medians differ Source: Zagornik, et al (1993)
Computer Output - SPSS Note that SPSS uses rank sum for Group 2 as test statistic
Rank-Sum Test: Normal Approximation • Under the null hypothesis of no difference in the two groups (let T be rank sum for group 1): • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution Note: When there are many ties in ranks, a more complex formula for sT is often used, see p. 254 of Longnecker and Ott.
Example - Maze Learning Adults = Group 1
Inference Based on Paired Samples (Crossover Designs) • Setting: Each treatment is applied to each subject or pair (preferably in random order) • Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i • Parameter: mD - Population mean difference • Sample Statistics:
Test Concerning mD • Null Hypothesis: H0:mD=D0 (almost always 0) • Alternative Hypotheses: • 1-Sided:HA: mD > D0 • 2-Sided: HA: mDD0 • Test Statistic:
Test Concerning mD • Decision Rule: (Based on t-distribution with n=n-1 df) • 1-sided alternative (HA: mD > D0) • If tobsta ==> Conclude mD> D0 • If tobs < ta ==> Do not reject mD= D0 • 2-sided alternative (HA: mDD0) • If tobsta/2 ==> Conclude mD> D0 • If tobs -ta/2 ==> Conclude mD< D0 • If -ta/2 < tobs < ta/2 ==> Do not reject mD= D0 Confidence Interval for mD
Example Antiperspirant Formulations • Subjects - 20 Volunteers’ armpits (df=20-1=19) • Treatments - Dry Powder vs Powder-in-Oil • Measurements - Average Rating by Judges • Higher scores imply more disagreeable odor • Summary Statistics (Raw Data on next slide): Source: E. Jungermann (1974)
Example Antiperspirant Formulations Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1)
Small-Sample Test For Nonnormal Data • Paired Samples (Crossover Design) • Procedure (Wilcoxon Signed-Rank Test) • Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s). n= number of non-zero differences • Rank the observations by |di| (smallest=1), averaging ranks for ties • Compute T+ and T- , the rank sums for the positive and negative differences, respectively • 1-sided tests:Conclude HA: M1 > M2 if T=T- T0 • 2-sided tests:Conclude HA: M1M2 if T=min(T+ , T-) T0 • Values of T0 are given in Table 7, pp 684-685 for various sample sizes and significance levels. P-values printed by statistical software packages.
Signed-Rank Test: Normal Approximation • Under the null hypothesis of no difference in the two groups : • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution
Example - Caffeine and Endurance • Subjects: 9 well-trained cyclists • Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2) • Measurements: Minutes Until Exhaustion • This is subset of larger study (we’ll see later) • Step 1: Take absolute values of differences (eliminating 0s) • Step 2: Rank the absolute differences (averaging ranks for ties) • Step 3: Sum Ranks for positive and negative true differences Source: Pasman, et al (1995)
Example - Caffeine and Endurance Original Data
Example - Caffeine and Endurance Absolute Differences Ranked Absolute Differences T+ = 1+2+4+6+7+8=28 T- = 3+5+9=17
Example - Caffeine and Endurance Under null hypothesis of no difference in the two groups (T=T+): There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose)
SPSS Output Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Sample Sizes for Given Margin of Error • Goal: Achieve a particular margin of error (E) for estimating m1-m2 (Width of 95% CI will be 2E) • Case 1: Independent Samples (Assumes equal variances) • Case 2: Paired Samples In practice, the variance will need to estimated in a pilot study or obtained from previously conducted work.
Sample Size Calculations for Fixed Power • Goal - Choose sample sizes to have a favorable chance of detecting a specified difference in m1 and m2 • Step 1 - Define an important difference in means: • Step 2 - Choose the desired power to detect the the clinically meaningful difference (1-b, typically at least .80). For 2-sided test: For 1-sided tests, replace za/2 with za In practice, variance must be estimated, or D given in units of s
Example - Rosiglitazone for HIV-1 Lipoatrophy • Trts - Rosiglitazone vs Placebo • Response - Change in Limb fat mass • Clinically Meaningful Difference - D=0.5s • Desired Power - 1-b = 0.80 • Significance Level - a = 0.05 Source: Carr, et al (2004)
Data Sources • Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of Orally Administered Levocabastine in Patients with Renal Insufficiency,” Journal of Clinical Pharmacology, 33:1214-1218 • Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-??? • Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638 • Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16:225-230 • Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, Placebo-Controlled Trial,” Lancet, 363:429-438