Lecture 7 Outline – Thur, Sep 25

Lecture 7 Outline – Thur, Sep 25 • Two sample t-tests – Chapter 2.3 • Levene’s test for equality of two variances – Chapter 4.5.2 • Inferences in a two-treatment randomized experiment – Chapter 2.4 • Interpretation of p-values – Chapter 2.5.1 • Practical and Statistical Significance – Chapter 4.5.1 • Choosing a sample size for a study – material in Chapter 23.4 but slides will give you information you need

Two independent samples • Probability model: Independent simple random samples from two populations - sample from population I - sample from population II • Examples: • Perished and surviving sparrows • Men’s and women’s scores on a social insight test • Cholesterol in urban and rural Guatemalans

Two-sample t-test • Population parameters: • H0: , H1: • Equal spread model: (call it ) • Statistics from samples of size n1 and n2 from pops. 1 and 2: • For Bumpus’ data:

Sampling Distribution of • (equal spread model) • Pooled estimate of : See Display 2.8

Two sample t-test • H0: , H1: • Test statistic: T= • If population distributions are normal with equal , then if H0 is true, the test statistic t has a Student’s t distribution with degrees of freedom. • p-value equals probability that T would be greater than observed |t| under random sampling model if H0 is true; calculated from Student’s t distribution. • For Bumpus data, two-sided p-value = .0809, suggestive but inconclusive evidence of a difference

One-sided p-values • If H1: , test statistic is • If H1: , test statistic is p-value is probability that T would be >= observed T0 if H0 is true

Confidence Interval for • 100(1- )% confidence interval for : • For 95% confidence interval, • Factors affecting width of confidence interval: • Sample size • Population standard deviation • Level of confidence

Two sample tests and CIs in JMP • Click on Analyze, Fit Y by X, put Group variable in X and response variable in Y, and click OK • Click on red triangle next to Oneway Analysis and click Means/ANOVA/t-test • To see the means and standard deviations themselves, click on Means and Std Dev under red triangle

Bumpus’ Data Revisited • Bumpus concluded that sparrows were subjected to stabilizing selection – birds that were markedly different from the average were more likely to have died. • Bumpus (1898): “The process of selective elimination is most severe with extremely variable individuals, no matter in what direction the variations may occur. It is quite as dangerous to be conspicuously above a certain standard of organic excellence as it is to be conspicuously below the standard. It is the type that nature favors.” • Bumpus’ hypothesis is that the variance of physical characteristics in the survivor group should be smaller than the variance in the perished group

Testing Equal Variances • Two independent samples from populations with variances and • H0: vs. H1: • Levene’s Test – Section 4.5.3 • In JMP, Fit Y by X, under red triangle next to Oneway Analysis of humerus by group, click Unequal Variances. Use Levene’s test. • p-value = .4548, no evidence that variances are not equal, thus no evidence for Bumpus’ hypothesis.

t-tests for randomized experiments • Section 2.4 • t-test (with its associated Student t distribution under H0) has been developed in Ch. 2 for making inferences to populations using the random sampling probability model. • In Ch. 1, we studied making causal inferences in the additive treatment effect model using the probability model of a randomized experiment. • The two-sample t-statistic is a reasonable test statistic for testing H0: additive treatment effect is

t-test for randomized experiments cont. • The t-test provides an approximately correct p-value and confidence interval for a randomized experiment, i.e., the distribution of the t-statistic under the null hypothesis of an additive treatment effect of is well approximated by the Student’s t distribution with degrees of freedom. • See Display 2.11 • Bottom line: t-test in JMP can be used to make approximately correct inferences (p-values and CIs) for randomized experiments but inferences should be phrased in terms of additive treatment effects rather than difference in population mean.

Notes about tests, p-values • Interpretation of p-value: • Formally: the probability of random sampling (or random assignment) leading to a test statistic at least as large as the observed one if is true. • Informally, the degree of credibility in H0. • Conclusions from p-values • (a) Small p-values mean either (i) H0 is wrong or (ii) we obtained an unusual sample • (b) Large p-values mean either (i) H0 is correct or (ii)the study isn’t large enough to conclude otherwise (i.e., the data are consistent with H0 being true but do not prove it).

Conceptual Question 2.8 • Suppose the following statement is made in a statistical summary: “A comparison of breathing capacities in individuals in households with low nitrogen dioxide levels and individuals in households with high nitrogen dioxide levels indicated that there is no difference in the means (two-sided p-value =.24).” What is wrong with this statement?

Interpretation of p-values • So what p-values are small and large. • For reference: chance of • 3 heads in 3 coin tosses is .125 • 4 4 .063 • 5 5 .031 • 6 6 .016 • 7 7 .008 • 8 8 .004 • See Display 2.12 for a subjective guide.

Practical and Statistical Significance • Section 4.5.1 • p-values indicate statistical significance, the extent to which a null hypothesis is contradicted by data • This must be distinguished from practical significance, the practical importance of the finding.

Example • Investigators compare WISC vocabulary scores for big city and rural children. • They take a simple random sample of 2500 big city children and an independent simple random sample of 2500 rural children. • The big city children average 26 on the test and their SD is 10 points; the rural children average only 25 and their SD is 10 point • Two sample t-test: , p-value .00005 • Difference between big city children and rural children is highly significant, rural children are lagging behind in development of language skills and the investigators launch a crusade to pour money into rural schools.

Example Continued • Confidence interval for mean difference between rural and big city children: (.43,1.28). • WISC test – 40 words child has to define. Two points given for correct definition, one for partially correct definition. • Likely value of mean difference between big city and rural children is about one partial understanding of a word out of forty. • Not a good basis for a crusade. Actually investigators have shown that there is almost no difference between big city and rural children on WISC vocabulary scale.

Practical vs. Statistical Significance • The p-value of a test depends on the sample size. With a large sample, even a small difference can be “statistically significant,” that is hard to explain by the luck of the draw. This doesn’t necessarily make it important. Conversely, an important difference may not be statistically significant if the sample is too small. • Always accompany p-values for tests of hypotheses with confidence intervals. Confidence intervals provide information about the likely magnitude of the difference and thus provide information about its practical importance.

Conclusions from a Study • A successful experiment has both statistical and practical significance. • Often the results of a study may be a summarized by a confidence interval on a key parameter (e.g., treatment effect) • Display 23.1 – four possible outcomes to a confidence interval procedure. • First three outcomes – A, B and C – are successes in that it is possible to draw an inferential conclusion that distinguishes between the important alternatives in one way or another. But outcome D is a failure because both the null hypothesis and practically significant alternatives remain plausible.

Designing a Study • Role of research design is to avoid outcome D. This is accomplished by making confidence interval short enough that it cannot simultaneously include both parameter values. • How to make confidence interval short enough (Display 23.2)? • Make s small through blocking, covariates, improved measurement (more later in course) • Choose large enough sample size.

Choosing the sample size • Suppose the null hypothesis is that in a matched pairs study. • Let PSD denote the practically significant alternative that is closest to zero. • A confidence interval for has margin of error . • We want the CI to have margin of error less than |PSD|, i.e., • Thus, we want the sample size n to satisfy • Solving for n gives that the sample size needs to be at least 4s2/PSD2. • Sample size calculation requires an estimate of (s) before conducting the study.

Example • Blood platelet aggregation before and after smoking cigarettes • The smallest medically significant difference is considered to be 1 platelet. The standard deviation of differences before and after in the population is estimated to be 8. How large a sample should be taken so that the confidence interval is not likely to contain both the null hypothesis that the difference is zero and a difference of 1 platelet?

Choosing Sample Size • Similar principles can be used to find appropriate sample sizes for two independent sample studies and randomized experiments

Lecture 7 Outline – Thur, Sep 25