450 likes | 621 Views
Today. Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z -tests for sample means The 5 steps of hypothesis-testing Type I and Type II error (not necessarily in this order). Hypothesis testing.
E N D
Today • Null and alternative hypotheses • 1- and 2-tailed tests • Regions of rejection • Sampling distributions • The Central Limit Theorem • Standard errors • z-tests for sample means • The 5 steps of hypothesis-testing • Type I and Type II error • (not necessarily in this order)
Hypothesis testing • Approach hypothesis testing from the standpoint of theory. • If our theory about some phenomenon is correct, then things should be a certain way. • If the commercial really works, then we should see an increase in sales (that cannot easily be attributed to chance). • Hypotheses are stated in terms of parameters (e.g., “the average difference between Groups A and B is zero in the population”).
Hypothesis testing • We will always observe some kind of effect, even if nothing interesting is going on. • It could be due to chance fluctuations, or sampling error... or there really could be an effect in the population. • Inferential statistics help us decide. • If we conclude, on the basis of statistics, that an effect should not be attributed to chance, the effect is termed statistically significant.
Say we know m and s, and that they are m = 64.28” and s = 3.1”, like in the female sample. • We want to know if the 74-inch-tall person is female. • Use logic to make a good guess.
If the person is female, then her distribution has m = 64.28” and s = 3.1” (assuming normality). • That implies that “her” z-score is: • Very unlikely that this person is female! • We could do this because we made the assumption of normality, and assumed m = 64.28” and s = 3.1”.
Hypothesis testing • A hypothesis is a theory-based prediction about population parameters. • Researchers begin with a theory. • Then they define the implications of the theory. • Then they test the implications using if-then logic (e.g., if the theory is true, then the population mean should be greater than 3.8).
Hypothesis testing • Null hypothesis – Represents the “status quo” situation. Usually, the hypothesis of no difference or no relationship.E.g. ... • Alternative hypothesis – what we are predicting will occur. Usually, the most scientifically interesting hypothesis. E.g. ...
Conventions • By convention, the null and alternative hypotheses are mutually exclusive and exhaustive. E.g. ... • Not everyone follows this convention.
Hypothesis testing • This is an example of a 2-tailed hypothesis test: • Null distribution:
1-tailed tests • Say we had the following hypotheses: • We would reject the null hypothesis only if the observed mean is sufficiently positive. • “Sufficiently” because sample means will always differ. We care about the population, not samples. • If we conclude that chance variability isn’t driving the effect, then we say the effect is statistically significant.
An example... • Say we want to know if UNC students’ IQ differs from the national average. We know: • We pick a student at random (our “sample”), and give her an IQ test. She scores 700. • Was her score drawn from the U.S. population at large, or from another (more intelligent) distribution?
An example... • The null hypothesis is that she is part of the U.S. population distribution of IQ test-takers. Nothing special. • The alternative is that she is from some other (more intelligent) population distribution. • 1-tailed, because we are interested only if UNC students are more intelligent than average.
An example... • First, draw the • null distribution: • Then define the • region(s) of rejection: a
An example... • How did I find the “critical value” of IQ? By knowing alpha, knowing how to use Table E10, and a little algebra... • First, find z given p, then...
An example... • Our student’s IQ score is 700. Does it fall in the region of rejection? • ...Yes!
An example... • We could have done this by comparing z-scores instead of raw scores. • 2.0 > 1.645, so we reject H0.
An example... • We also could have done this by comparing a p-value to a instead of comparing raw scores or z-scores. • The p-value corresponding to a z-score or 2.0 is .0228. • .0228 < .05, so we reject H0. • A UNC student with an IQ of 700 would be very rare if drawn from the null population with m = 500. In fact, even more rare than we are willing to tolerate (remember, a = .05).
3 Decision rules in this example • We need to know if we should reject H0. These three rules all yield the same conclusion. Reject H0 if...
But... • Wait a minute – we did all that with only one student?? • The sample was very small (N = 1) to making such bold claims about UNC. • We need a representative sample, N >> 1. • The logic of hypothesis testing is exactly the same with samples as it is with individuals. • But, we need to know about sampling distributions...
Sampling distributions • Sampling distribution: A distribution of some statistic. • “Sampling distribution of _____” (mean / variance / z, t, etc.)
The Central Limit Theorem • Given a population with mean m and variance s2, the sampling distribution of the mean (the distribution of sample means) will have a mean equal to m and a variance equal to: • ...and thus a standard deviation of: • The distribution will approach normality as N increases. [from Howell, p. 267]
The Central Limit Theorem • ...is called the standard error of the mean, or simply standard error.
N = 1 The Central Limit Theorem • As sample size • increases, the • standard error • decreases. N = 5 N = 20
The Central Limit Theorem • Another example...
Back to the UNC IQ example... • Let’s say we that we collect a sample of N = 4 UNC students. • Their IQs are 700, 710, 680, and 670. • Now the mean is • Is there enough evidence to claim that UNC students are brighter than average? • Now the question is, “if the population mean is 500, how extreme would a sample mean of 690 be (given that N = 4)?
In terms of z-scores... • The critical value for z is still +1.645 (because it’s a 1-tailed test and a = .05). • 3.8 > 1.645, so reject H0. • Conclusion: UNC students are likely brighter than average (we’ll never really know for sure).
Another example • Your theory says that Benadryl should alter reaction time on some task, but you are not sure how. The null and alternative hypotheses might be: • We’re given that s = .032 seconds • We’re given that N = 400 • We’re given that a = .01
Finding critical z’s for a 2-tailed test z = -2.575 z = +2.575
Another example • We collect data from our 400 subjects and find the mean RT to be .097 seconds. • .097 is different from .09, but different enough? • 4.375 > 2.575, so reject H0. Benadryl probably does have an effect on reaction time. Specifically, it slows people down.
N = 1: a special case? • When N = 1, • ...and:
The 5 steps of hypothesis testing • Specify null and alternative hypotheses. • Identify a test statistic. • Specify the sampling distribution and sample size. • Specify alpha and the region(s) of rejection. • Collect data, compute the test statistic, and make a decision regarding H0.
1. Null and alternative hypotheses • Specify H0 and H1 in terms of population parameters. • H0 is presumed to be true in the absence of evidence against it. • H1 is adopted if H0 is rejected.
2. Identify a test statistic • Identify a test statistic that is useful for discriminating between different hypotheses about the population parameter of interest, taking into account the hypothesis being tested and the information known. • E.g., z, t, F, and c2.
3. Sampling distribution and N • Specify the sampling distribution and sample size. • The sampling distribution here refers to the distribution of all possible values of the test statistic obtained under the assumption that H0 is true. • E.g., “N = 48. The sampling distribution is the standard normal distribution (distribution of z statistics), because we are testing a hypothesis about the population mean when s is known.”
4. Specify a and the rejection regions • Alpha (a) is the probability of incorrectly rejecting H0 (rejecting the null hypothesis when it is really true). • Regions of rejection are those ranges of the test statistic’s sampling distribution which, if encountered, would lead to rejecting H0. • The regions of rejection are determined by a and by whether the test is 1-tailed or 2-tailed.
5. Collect data, compute the test statistic, make a decision • For example... • E.g., “2.77 > 1.96, so reject H0 and conclude that...” • Always couch the conclusion in terms of the original problem.
The 5 steps: Example • Let’s say you think a certain standardized achievement test is biased against Asian-Americans. You know that for the non-Asian-American population... • In the sample...
The 5 steps: Example • Specify null and alternative hypotheses. • Identify a test statistic. • We want to compare a sample mean to a hypothesized value, and we know s, so we use a z-test.
The 5 steps: Example • 3. Specify the sampling distribution and sample size. • The sampling distribution of z is the standard normal distribution. • Specify alpha and the region(s) of rejection. • The regions of rejection are harder...
The 5 steps: Example • Collect data, compute the test statistic, make a decision. • We collect data. Say the mean is 97.1. Does 97.1 fall in the region of rejection?
Type I and Type II errors • There are two ways to make an incorrect decision in hypothesis testing: Type I and Type II errors. • Type I error: Concluding that the null hypothesis is false when it is really true. • We control the probability of making a Type I error (alpha). • Alpha (a): The risk of incorrectly rejecting a true null hypothesis. • Why not make a really, really small? The smaller we make a, the more likely it becomes we will encounter a Type II error.
Type I and Type II errors • Type II error: Concluding the null hypothesis is true when it is really false. • Beta (b): The probability of incorrectly retaining a false null hypothesis.
Next time... • Power • Effect size • Statistical significance vs. practical significance • Confidence intervals