Intro to Statistics for the Behavioral Sciences PSYC 1900

Intro to Statistics for the Behavioral SciencesPSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples

Clarification of Estimating Standard Errors • The sample sd is an unbiased estimator of the population sd, but any single sample sd is likely to underestimate the population sd. • Standard error calculations using the sample sd will usually produce probability values that are too low (i.e., z scores that are too high). • Consequently, we use the t distribution, as opposed to the normal, to adjust for this bias.

One More Example for When Population Mean is Known • One case where this is quite common is testing whether participants’ responses are greater than chance. • For example, can participants identify subliminally-presented stimuli. • The comparison mean would be .50 • Number of trials where responses are scored 0 or 1 for incorrect or correct.

One More Example for When Population Mean is Known • We do not however know what the population variance is. • We must estimate it using the sample variance. • When we do this, we underestimate it, resulting in lower standard errors and higher z-scores (i.e. Type I errors). • Therefore, we will use the t distribution.

One More Example for When Population Mean is Known • Let’s assume the sample is 25 people, the mean accuracy =.56, and the sample sd=.09.

Confidence Limits • What is the 95% confidence interval for accuracy?

Comparing Means from Related Samples • A more frequent case found in behavioral research is the comparison of two sets of scores that are related (i.e., not independent). • Pre-test / post-test designs • Dyads • Dependence implies that knowing a score in one distribution allows you better than chance prediction about the related score in the other distribution.

Comparing Means from Related Samples • The null hypothesis in all cases is: • This can be recast using difference scores. • Difference scores are calculated as the difference between the subjects’ performance on two occasions (or the difference between related data points)

Comparing Means from Related Samples • Once we do this, we are again working with a “single” sample with a known prediction for the mean. • Thus, we can use a t test as we did previously, with minor modifications. • We simply calculate the sd of the distribution of difference scores and then use it to estimate the associated standard error. • Note that df’s again = N-1.

Advantages and Disadvantages of Using Related Samples • Greatly reduces variability • Variability is only with respect to change in dv • Provides perfect control for extraneous variables • Control group is perfect • Require fewer participants • Problems of order and carry-over effects • Experience at time 1 may alter scores at time 2 irrespective of any manipulations

Effect Size • Can we use p-values to quantify the magnitude of an effect? • No, as any given difference between means will be more or less significant as a function of sample size (all else being equal). • We need a measure of the magnitude of the differences that is separate from sample size.

Effect Size • Cohen’s d is a common effect size measure for comparing two means. • By convention: d=.2 small, d=.5 medium, d=.8 large • Can be interpreted as “non-overlap” of distributions.

Comparing Means from Independent Samples • This represents one of the most frequent cases encountered in behavioral research. • No specific information about the population mean or variance is know. • We randomly sample two groups and provide one with a relevant manipulation. • We then wish to determine whether any differences in group means is more likely attributable to the manipulation or to sampling error.

Comparing Means from Independent Samples • In this case, we have two independent distributions, each with its own mean and variance. • We can easily determine what the difference is between the two means, but we will need a measure of sampling error with which to compare it. • Unlike previous examples, we will need a standard error for the difference between two means.

Standard Errors for Mean Differences Between Independent Samples • The logic is similar to what we have done before. • Assume two distinct population distributions. Then, sample pairs of means from each. • The distribution of the mean differences constitutes the appropriate sampling distribution. • Its sd is the standard error for the t test. • The variance sum law dictates that the variance of the sum (or difference) of two independent variables is equal to the sum of their variances.

The means and sd’s for the distributions and their differences are calculated as at right. We know from the central limit theorem that the resulting sampling distributions will be normal. But, the problem of not knowing what the true population sd is arises. To deal with this problem, we must again use the t as opposed to normal distribution to calculate standard errors.

t Tests for Independent Samples • The formula is a generalization of the previous formula. The null is that the mean difference between the samples is zero. • df’s = (n1-1)+(n2-1)=n1+n2-2.

t Tests for Independent Samples with Unequal n’s • In the previous formula, we assumed equal condition n’s. Sometimes, however, the n of one sample exceeds the other, in which case its variance is a better approximation of the population variance. In such cases, we pool the variances using a weighted average.

Assumptions for t Tests • Homogeneity of Variance • The population variances of the two distributions are equal • Implies that the variance of the two samples should be relatively equal • Heterogeneity is usually not a problem unless the variance of one sample is greater than 3 times that of the other. • If this occurs, SPSS and other programs will provide an both a normal and adjusted t value. • The adjustment lowers the df’s which reduces chances for a type I error.

Assumptions for t Tests • Normality of Distributions • We assume that the sampled data are normally distributed. • They need not be exactly normal, but should be unimodal and symmetric. • Really only a problem for small samples, as the CLT applies everywhere for large samples.

Effect Size • Cohen’s d is also used for independent samples. • The only difference is that we use the pooled sd term.

Confidence Limits • What is the 95% confidence interval for accuracy?

Intro to Statistics for the Behavioral Sciences PSYC 1900