340 likes | 470 Views
RMTD 404. Lecture 3. Distributions and Probability. From this point on, we are going to work extensively with distributions and probabilities of events.
E N D
RMTD 404 Lecture 3
Distributions and Probability • From this point on, we are going to work extensively with distributions and probabilities of events. • Just about every situation that we deal with in statistics involves estimating probabilities of events given what we know about a distribution (e.g. coin). • Most of the time those distributions are not known, so we make assumptions about those distributions. • Based on those assumptions, we’ll estimate the probability or likelihood of observing a specific event or series of events. • The primary distribution we have been discussing is the normal distribution.
Distributions and Probability (recap) • A normal distribution has the following characteristics: • unimodal: peaked in the middle. • symmetrical: the left and right sides of the distribution are mirror images. • bell-shaped: probabilities taper in the tails of the distribution. • unlimited: the tails of the distribution extend to infinity in both directions. • The normal distribution is useful for several reasons: • It is a shape that we frequently see in the real world and everywhere else • We know the probability density function for the normal curve. • Many of the statistics we deal with take a normally distributed shape under repeated sampling. • Many statistical procedures have been developed that rely on an assumption of normally distributed data.
Distributions and Probability (recap) • Recall that we can standardize a variable by linearly transforming it to have a mean of 0 and standard deviation (and variance) of 1. • When you standardize a normally distributed variable, the underlying distribution (with a mean of 0 and variance of 1) is called a standard normal curve, represented as N(0,1). The standard normal curve is useful because it simplifies the interpretation of probabilities of events—most tables of normal curve probabilities are based on the standard normal curve. • A “score” in a standard normal distribution is called a z score, and we can compute z scores via the following transformation:
We interpret a z score as the number of standard deviation units that a particular element lies from the mean of the distribution. This is apparent from the form of the linear transformation that we apply to obtain z scores. Because of what we know about the standard normal curve, we can identify important probabilities associated with particular z scores. • P(z 0) = .50 • P(0 z 1) = .34 • P(-1 z 1) = .68 • P(z 1) = .5 + .34 = .84 • P(-2 z 2) = .9544 • P(-1.96 z 1.96) = .95 • P(z -1.96 and z 1.96) = 1 - P(-1.96 z 1.96) = .05
Z Table Although statistical software packages routinely compute the proportion of area under a normal curve for you, it is useful to learn how to read tables of those values. This table gives three proportions associated with a variety of z scores.
Z Table Column 2 Column 3 Column 4 Negative Values
Z Table As you can see, we can get the area between any two z scores by adding and subtracting areas that cumulatively make up the area we’re interested in. For example, what area is associated with each of the following statements? P(1 < z < 2) = P(-1 < z < 2) = P (z < -2 or z > 2.5) =
Other distributions we’ll use • T Distribution • F Distribution • Chi-Square Distribution
Sampling Distributions and Hypothesis Testing • We are now going to begin discussing how to use what we know about the distributional properties of the statistics and parameters we are interested • We want to determine the likelihood that an observed statistic came from some hypothetical population and make judgments based on this likelihood – this is basic hypothesis testing
Sampling Distributions and Hypothesis Testing • Two important terms • Sampling distribution is the distribution of the value of a particular statistic, over a hypothetically infinite repeated samples of equal size taken from the same population • Standard error is the standard deviation of the sampling distribution • We are often interested in the mean – so with this information we want to know what the mean would look like over an infinite number of experiments
Sampling Distributions and Hypothesis Testing Very likely Less likely Unlikely
Sampling Distributions and Hypothesis Testing • Steps for testing a hypothesis: • We generate a research hypothesis (in words)—a theory-based prediction. When written symbolically, the research hypothesis is called the alternativehypothesis (a.k.a. H1 or Ha). • We pretend that the data were chosen from a population with known characteristics. That is, we create a null hypothesis (H0)—one that, based on our theory, we believe to be incorrect. • We gather data (e.g., randomly sample people, randomly assign them to treatments, expose them to the treatments, and measure their responses to the treatments). • We compute the characteristics of the sampling distribution of the statistic assuming that the null hypothesis is true. (e.g. µ=0 σ=1)
Sampling Distributions and Hypothesis Testing • We calculate the probability of obtaining a statistic as extreme as or more extreme than the one observed, based on the sampling distribution. • We decide whether the observed probability of that value (or a more extreme one) is too remote to support our theory. • If the probability of obtaining the observed statistic is very small, then we reject our null hypothesis and retain our alternative hypothesis. That is, we retain our theory. • If the probability of obtaining the observed statistic is not small, then we retain our null hypothesis and fail to support our alternative hypothesis. That is, we fail to support our theory (does not mean our theory is false!) • We make a substantive (word- and theory-driven) interpretation of the statistical test. • *Knowing the shape of a sampling distribution allows us to determine the probability of observing a particular test or sample statistic under the assumption that the null hypothesis is true.
Example • GRE quantitative scores are believed to be normally distributed with a mean of 500 and standard deviation of 100. • Suppose you have a student who participated in a new GRE preparation course. Advocates of the course claim that its success will demonstrate that quantitative GRE scores can be altered by targeted study. • The developers of the GRE claim that the course will not work because the quantitative GRE test measures skills that must be developed over a long period of study. • As you can see, there is controversy—one position suggests that the student who has experienced the preparation course will perform better than average and the other position suggests that the student’s performance will be “typical.”
Example • To go about determining whether this student is better than “typical,” we state our research hypothesis (in this example, from the perspective of the proponents of the preparation course): This student has a higher than average quantitative test score. • Symbolically, we write the research hypothesis as the alternative hypothesis: • Ha: μX>μ0 or μtest prep>μtypical or μtest prep> 500 • (The population from which this observation came has a mean greater than the typical mean of 500). • Then we state the converse of the alternative hypothesis as our null hypothesis: • H0: μXμ0 or μtest prep<μtypical or μtest prep< 500 • (i.e., This student has “typical” quantitative skills.) • Note that the alternative and null hypotheses refer to parameters rather than statistics.
Example Here’s a picture of our decision-making framework. Note that we need to identify only one point on the GRE scale where we believe that the possibility is too remote to be reasonable—a value that is too high to be believable. What value would you choose? Too remote to be plausible
Example • Suppose that we record the student’s GRE score, and it equals 740. This observation seems to be more consistent with the claims of the course advocates than the claims of the test developers. • Now the question now becomes: Under the H0 assumption that this student is typical, how unusual is a score of 740 (or greater) on the quantitative section of the GRE? • That is, how unlikely is a score as or more extreme than 740? As specified in our decision making framework, we want to talk about the absolute magnitude of the score, relative to the population mean, by considering only the upper tail of the null distribution.
Example • We can use z-scores to estimate the proability P(x ≥ 740) P(z ≥ 2.4) = .01
Sampling Distributions and Hypothesis Testing – Rejecting the Null Hypothesis • Hence, we would observe a score as or more remote than 740 less than 1% of the time in the population of “typical” GRE test takers. • We refer to this probability as a p-value -- the probability of obtaining a score as extreme as or more extreme than the one observed under the assumption of the null hypothesis. • If we believe that this score is too improbable to have occurred by chance, then we would reject our null hypothesis and retain our alternative and research hypotheses, concluding that this student is not typical. • If we do not believe that this is too improbable to have occurred by chance, then we would retain our null hypothesis. • *Typically, we don’t conclude that our null hypothesis is true, we simply conclude that we don’t have sufficient evidence to support our research hypothesis.
Sampling Distributions and Hypothesis Testing – Rejection Regions and Critical Values • Two points are important: • First, our decision making criteria is somewhat arbitrary—different people might use different criteria to define “improbable.” • Second, because we are making retain/reject decisions based on probabilities, we might be making a mistake—we could reject the null hypothesis when it is indeed true. • Researchers have adopted the convention that observations that could occur less than 5% of the time under the null hypothesis are improbable enough to reject the null hypothesis. • Other less common levels are 1% (i.e., a stricter rule, because it requires a more unusual result to “reject”) and 10% (a more lenient rule because it requires a less unusual result to “reject”).
Sampling Distributions and Hypothesis Testing – Rejection Regions and Critical Values • This rejection level (a.k.a. significance level) indicates how unlikely an event must be before we reject the null hypothesis. • So, by the conventional standard, the probability (p-value) must be .05 or less. • Two terms that are related to each other: • Rejection region: The area(s) under the sampling distribution where events are unlikely enough to warrant rejecting the null hypothesis; • Critical value: The raw score associated with the boundary of the rejection region. • In the GRE score example, the critical value equals 664. • CV = X where P(z > Zx) < .05 • CV = X where z =1.64 • CV=100(1.64)+500=664
Sampling Distributions and Hypothesis Testing – Rejection Regions and Critical Values • In our case, H0 is μtest prep = 500. The critical value for the extreme areas under the curve is 664. • Because our observed GRE score of 740 falls in the rejection region (or outside of the critical value), we reject the null hypothesis and conclude that the alternative hypothesis is true. CV = 664 Retain null Reject null x = 740
Recap • So far, we’ve introduced concepts that allow us to test the null hypothesis in two ways. • Compute critical values (by converting the relevant z score or scores associated with a to the raw score scale) and compare the observed statistic to the critical value(s). • If the observed statistic is more extreme than the critical value (s), then reject the null hypothesis. • Compute the p-value of the observed raw score (by converting the observed raw score to a z score and finding the probability of that z score in the normal curve table) and compare the p-value to the chosen α. • If the p-value is smaller than the chosen a, then reject the null hypothesis
Errors • Since α equals the probability of incorrectly rejecting the null hypothesis, (1- α) equals the probability of correctly retaining the null hypothesis. • In our example, we would correctly retain the null hypothesis 1-.05 = .95 or 95% of the time. • *The level α corresponds to the critical value (here 664) and represents the probability of rejecting a true null hypothesis. CV = 664 1-alpha p alpha x = 740
Errors • Now consider this figure, which contains an arbitrarily-chosen alternative distribution (shaded). This is one of many possible distributions that could have generated the observed score. • When we retain the null hypothesis, we can make another type of error when we retain the null hypothesis, if this alternative is true. • In this example, we may incorrectly reject the alternative distribution and retain the null distribution with a very high probability. This type of error is called a Type II error and is represented as the beta level (β) of the hypothesis test. β represents the probability of incorrectly rejecting a true alternative distribution; incorrectly retaining the null Beta
Errors • Recall that (1-α) represents the probability of correctly retaining the null hypothesis. On the other hand, (1-β) represents the probability of correctly rejecting the null hypothesis. • This probability is given a special name, statistical power or simply power. Power only applies when H0 is false. That is, when H0 is true, we cannot correctly reject it! 1-Beta
Summary of Errors • The table below summarizes the nature of statistical errors and the corresponding symbols. • However, also realize we will typically NOT know what the “Truth” is—if we did, we would not need to use statistics in our decision-making. • Hence, estimating statistical power requires us to make a lot of assumptions.
One and two-tailed tests • Our GRE example considered only one tail of the null distribution as fair game for rejecting the null hypothesis. That is, the observed score could have been only greater than the population mean. • One-tailed (a.k.a. directional) hypothesis test allows us to focus all of your attention on differences in one tail of the null distribution. Your null hypothesis would state that the parameter you are interested in is equal to or more extreme than some value (e.g., H0: μX 0 or H0: μX 0, depending on the expected direction), and your alternative hypothesis would state that the parameter is greater than or less than that value (e.g., H1: μX > 0 or H1: μX < 0, respectively). irrelevant improbable
One and two-tailed tests • If you cannot confidently predict the direction of the expected difference, you should focus your attention on both tailsof the null distribution. In this case, you would perform a two-tailed (a.k.anon-directional) test. • Your null hypothesis would state that the parameter you are interested in equals some value (e.g., H0: μX = 0), and your alternative hypothesis would state that the parameter is simply not equal to that value (e.g., H1: μX 0). • A two-tailed test would be appropriate either when • no theory exists for making a prediction about the direction of observed differences, or • two competing theories predict the opposite outcomes. • *Many researchers use two-tailed tests even though they are seldom warranted.
One and two-tailed tests • When you choose a two-tailed test, you choose to divide your Type I error rate (a) into both tails of the null distribution. As a result, you choose critical values for rejecting the null hypothesis that define the most extreme a/2 proportion in each tail. irrelevant improbable
One and two-tailed tests • By using a one-tailed hypothesis test, you require a less extreme criticalvalue—all of a lies in a single tail of the distribution. • Hence, when α = .05 in a one-tailed (directional) test, the 5% of the null distribution that constitutes the rejection region lies in the single tail that is relevant to the hypothesis test. • On the other hand, in a two-tailed (non-directional) test, the 5% of the null distribution that constitutes the rejection region is divided into each tail (2.5% each). alpha alpha/2 alpha/2