130 likes | 264 Views
Today in Class. Last time we discussed statistical reasoning and Type I and Type II errors Today we’ll discuss Type I and Type II errors in more depth We’ll also discuss the necessity of sampling distributions and how to find the sampling distribution for a sample proportion.
E N D
Today in Class • Last time we discussed statistical reasoning and Type I and Type II errors • Today we’ll discuss Type I and Type II errors in more depth • We’ll also discuss the necessity of sampling distributions and how to find the sampling distribution for a sample proportion
Hypothesis Testing Example • I know I have 5 eggs, but I don’t know if they’re good or bad. • I’ll make a guess that 3 are good. • Then I can get all possible samples of 3 from that scenario. • I note that for this hypothetical pop, it is impossible to get 3 bad eggs out of 3. • It is also unlikely (but still possible) to get 3 good eggs out of 3. • I’ll take a real sample, if I get either of these cases, I won’t believe the hypothesized pop.
Type I and Type II Errors • Recall that a Type I error is rejecting a true null hypothesis. • If the null hypothesis (3/5 good eggs) is true, my decision rule will reject this hypothesis for 1/10 samples. Therefore, the probability of a Type I error is 0.10. • Type II errors depend on what the true population is.
Type I and Type II Errors • If there are no bad eggs in the pop of 5, then all sample of 3 will have all bad eggs. I’ll reject the null hypothesis - correct decision. In this case, I can’t make a Type II error. • If there is 1 bad egg in the pop of 5, then of the 10 possible samples, 6 samples have at least one bad egg and at least one good egg. I’ll fail to reject the false null hypothesis, and make a Type II error. Thus for this case, I have a 0.6 probability of a Type II error.
Type I and Type II Errors • If there are really 3 bad eggs in the pop of 5, then there is one sample (of 10 possible samples) for which I reject the null hypothesis. Thus, the probability of a Type II error is 0.90. • If there are really 4 bad eggs in the pop of 5, then there are 4 samples (of 10) for which I will reject the null hypothesis. Probability of a Type II is 0.60.
Type I and Type II errors • If there are 5 bad eggs out of 5 in the pop, then every sample has 3 bad eggs and I reject the null hypothesis. Thus, the probability of a Type II error is 0 for this case. • I’ll demonstrate this with the coin-flip challenge.
Coin Flip Challenge • I make the real flips my null hypothesis, because I can characterize all the possible sets of 200 flips and their probabilities for real flips • I’ll make a decision rule to decide whether a set of 200 flips is real or not.
Statistical Reasoning • Since we must rely on samples to make inference about the population, we want to consider every possible sample from a hypothetical population. • The sampling distribution is the characterization of a sample statistic based on every possible sample from a hypothetical population. • Finding sampling distributions is central to statistics.
Mathematical Use of mathematics and systematic reasoning to derive sampling distribution Results in normal, t, c2, and F distributions (which we will study later) Simulation Uses a computer to mimick sampling process Take 1000’s of samples Relies on a sample of samples Mathematical approach should be used whenever possible Finding Sampling Distributions
An Example of a Simulation • To determine the distribution of the longest run in 200 coin flips, I used a simulation • Program to simulate flipping a fair coin 200 times • Repeat the 200 flips 1000 times • Note how often each run occurs.
Sampling Distribution of a Proportion • Suppose we’re drawing from a very large population and asking person if they’re a Democrat • Suppose 50% are Democrats • If we ask just one person, then we’ll get either a “yes” or “no” • Ask 2 people: (Y,Y), (Y,N), (N,Y), (N,N)
Sampling Dist. for Proportion • Ask 3 people, you get (YYY), (YYN), (YNY), (YNN), (NYY), (NYN), (NNY), (NNN) • Ask 4 people, continue • Keep going and for a large enough sample you get a bell-shaped curve!
The Normal Distribution • Symmetric and Bell-Shaped • Total Area = 1 since it covers all possible samples • Characterized by two quantities: the mean m and the standard deviation s • Represents all possible samples for hypothetical population • The mean m is the center • The sd s is how spread the curve is _ s m Increasing s makes the curve shorter and fatter Increasing m moves the curve to the right Areas represent probabilities of certain samples for the hypothetical population