How do we use confidence intervals & significance tests to make inferences from a random sample about a population m

How do we use confidence intervals & significance tests to make inferences from a random sample about a population mean? • How do we use confidence intervals & significance tests to compare the means of two populations?

Standard error: when the standard deviation of a statistic is estimated from the data (i.e. from a sample), the result is called the standard error of the statistic. • Standard error: the estimated average deviation from the expected value of the sample mean if the sample were repeated over & over.

Standard error is based on the t-distribution, not the standard normal (z) distribution.

Because it’s based on sample data, the t-distribution is less certain, less precise, & thus more variable than the z-distribution.

Hence the t-distribution is flatter, or wider, than the z-distribution, when N<=1000. • But the t-distribution closely & increasingly approximates the z-distribution once sample size reaches N=120. • When N>1000, then the t- and z-distributions are identical.

Put differently, the smaller the sample size (i.e. the fewer the degrees of freedom*), the wider (i.e. the less precise) the t-distribution is relative to the z-distribution. * Recall that ‘degrees of freedom’ are the amount of information available to estimate a statistic. The more df’s, the better.

This, then, is another reason to have larger samples: so that the t-distribution becomes more precise, & thus so hypothesis tests can be more accurate.

The z-distribution is used when we know the population’s standard deviation—which, however, we virtually never know. • Almost always, then, we use the t-distribution, because we are estimating a statistic from sample data.

Confirm that there’s a different t-distribution for each n – 1 distribution: • Check the t-distribution critical values in Moore/McCabe/Craig (Table D, page T-11) for each df.

N>=120: the t-distribution closely & increasingly approximates the z-distribution. • N>1000: the t-distribution & z-distribution are identical. • See Table D (page T-11).

Standard error of the mean: when the standard deviation of the mean is estimated from sample data (& thus the t-distribution is used). • Formula for the standard error of the mean:

We’ve already been using this formula, but we’ve generally been using the z-distribution. • From now on, when we refer to the standard error of the mean, we’ll use the t-distribution.

A sample mean will deviate from the population mean due to sampling error (not to mention non-sampling error). • The standard error of the mean gives the estimated size of this deviation.

From now on, then, think standard error & t-distribution.

Here’s the t-confidence interval for the mean of a quantitative variable:

How to use the t-distribution in hypothesis tests

We can use t-value confidence intervals to make inferences from a sample mean about a benchmark mean (i.e. some hypothesized parameter from the present or past).

The One-Sample t-Test • The one-sample t-test uses the t-confidence interval to compare the mean of a random sample to some benchmark parameter (from the present or past).

E.g., compare the mean SAT score of a random sample of FIU undergrads to some other, ‘ideal’ score (e.g., 500). • Is the difference large enough relative to the standard error of the difference to be statistically significant?

E.g., compare the mean SAT score of a random sample of FIU undergrads today to that of FIU undergrads a decade ago. • Is the difference large enough relative to the standard error of the difference to be statistically significant?

E.g., compare the mean SAT score of a random sample of FIU students to the national SAT mean. • Is the difference large enough relative to the standard error of the difference to be statistically significant?

The one-sample t-test compares the mean of a quantitative variable from a random sample to some benchmark parameter.

This benchmark parameter may be: • some measurement ideal • some independent, comparison group • a parameter from the past or present

The one-sample t-test requires: • a probability sample of independent observations • a quantitative variable • a graphic check for pronounced skewness & outliers • a benchmark comparison mean

t-tests of all sorts can be used safely: • When the probability sample N<15 if the data distribution is close to normal (i.e. no more than minimal skewness & no pronounced outliers, because mean & sd are not resistant). • When 15<N<40 & there is no pronounced skewness & no outliers. • When N>=40 (more or less) if there are no outliers, even if there is pronounced skewness (although transforming this may be safer), due to the central limit theorem & the law of large numbers.

What if the sample distribution is too small & non-normal, &/or contains pronounced outliers? • One possible option: transform the variable &/or eliminate the outliers (in Stata see ‘help ladder’). • Alternatively, use a non-parametric (i.e. distribution free) statistic: the sign rank test or the sign test—though these are much less precise & are weaker than parametric procedures for testing hypotheses. • Stata: see ‘help signrank’ or ‘help signtest.’ See Moore/McCabe chap. 7 & the CD-Rom chapter on non-parametric statistics.

If the distribution is acceptable or becomes so after you’ve intervened, then use the one-sample t-test: Ho: there’s no difference. Ha: there is a difference. • Or a one-sided alternative hypothesis.

Put differently: • Ho: difference = 0 • Ha: difference ~= 0 • One-sided hypothesis: difference > 0; or difference < 0

E.g., compare the mean SAT score of a random sample of FIU undergrads to some other, ‘ideal’ score (e.g., 500): is the difference statistically significant?

First, check that the sample assumption is fulfilled. • Second, do a graphic check for pronounced skewness (if sample size <40) & for outliers, taking action to minimize the problems if necessary. • Third, state the hypotheses, e.g.: Ho: FIU mean SAT = 500 Ha: FIU mean SAT 500 • Put differently: Ho: diff=0. Ha: diff 0.

Fourth, test the hypothesis. • These data aren’t in memory, so the Stata test is ttesti rather than ttest.

. testi sample-n sample-mean sample-sd benchmark-mean

. ttesti 400 512 73 500 One-sample t test Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x 400 512 3.65 73 504.8244 519.1756 Degrees of freedom: 399 Ho: mean(x) = 500 Ha: mean < 500 Ha: mean ~= 500 Ha: mean > 500 t = 3.2877 t = 3.2877 t = 3.2877 P < t = 0.9995 P > t = 0.0011 P > t = 0.0005 • Conclusion: Reject the null hypothesis (p=0.001 for a two-tailed test, df=399).

Note: if the data are in memory, modify the Stata command as follows: . ttest FIU_SAT = 500

Before we move on to another variety of t-tests: • What’s the purpose of the one-sample t-test? • What kind of data does it require? • How do we conduct the test? • When does it test significant or insignificant?

Example: • There is evidence that 51% of a specific graduate program’s student admissions are women, but your program has admitted just 43% women. • Should you use a one-sample t-test to assess whether or not this difference is statisically significant?

Caution • One-sample t-test requires a probability sample. • All conclusions are uncertain. • Sampling & non-sampling sources of error.

The next variety of t-test—matched pairs—applies the one-sample t-test to an aftervs. before ‘difference’ score for comparing means for a random sample of matched after vs. before observations.

E.g., the mean SAT score of a random sample of FIU students before they received SAT-training versus after they received such training • Is the difference in scores large enough relative to the standard error of the difference to be statistically significant?

E.g., the mean cholesterol level of a random sample of adults before they go on a low-fat diet versus after they went on the diet. • Is the difference large enough relative to the standard error of the difference to be statistically significant?

E.g., the mean earnings of a random sample of inner-city women workers before they received skill-training versus after they received such training • Is the difference large enough relative to the standard error of the difference to be statistically significant?

This is called the matched pair (or dependent sample) t-test: Ho: (i.e. there’s no after vs. before effect) Ha: (i.e. there is an after vs. before effect: the after-mean is greater than the before-mean)

The after vs. before matched pairs, of course, are not independent of each other.

Is the after vs. before difference in sample means large enough relative to the standard error of the difference to test statistically significant?

What kind of data does the matched pairs (or dependent sample) t-test require? • a random sample involving the same, matched observations (i.e. individuals or subjects) before & after the treatment • a quantitative variable • recall the previous discussion of sample size. • a graphic check for pronounced skewness & outliers

And something else: the sd of the ‘before’ group can’t be more than two times larger/smaller than that of the ‘after’ group. • If it is, then use an adjusted version of the t-test: e.g., Stata’s ‘unequal’ option)

How do we use confidence intervals & significance tests to make inferences from a random sample about a population m