180 likes | 360 Views
Error Types. The Fine Art of Knowing How Wrong You Might Be. To Err Is Human. Humans make an infinitude of mistakes. I figure safety in numbers makes it a little more okay. When we do a hypothesis test, there are only so many options for what we decide to do.
E N D
Error Types The Fine Art of Knowing How Wrong You Might Be
To Err Is Human • Humans make an infinitude of mistakes. • I figure safety in numbers makes it a little more okay. • When we do a hypothesis test, there are only so many options for what we decide to do. • More specifically, there are only four possible outcomes.
Four?!? • Some of you might be thinking that our only two possible options are to reject H0 or to not reject H0. • You are correct. • In each of these cases, though, we might be right or we might be wrong. • That is where the number four comes from.
Four • We might find sufficient evidence, and it turns out the evidence is telling us the truth. • We might find sufficient evidence, but it turns out the evidence was naughty, bad evidence that was misleading us. (A frame job.) • We might not find sufficient evidence, but it turns out that there really is the difference we were looking for. • We might not find sufficient evidence, and it turns out that is correct and that nothing is going on.
So Then What? • No matter what decision we make, we might be wrong. • When you’re right, you’re right. • When you’re wrong, you don’t know it, so act like you’re right…it projects an attractive level of confidence. • In statistics we have no real way to find the truth without taking a census. • Do not administer a census if you can help it. • A census is dumb.
Where Am I Going With This? • If we reject H0 (which means there was sufficient evidence), but we were wrong, this is called a Type I Error (Roman Numeral 1). • The probability of making this kind of error is whatever our threshold for crazy is. • So, normally, 5%. • We call this percentage significance level, and use the letter α for it. • We can set it as high or low as we want. • Normally we go with 5%.
Why Not Make It 0%? • If we reject H0 (which means there was not sufficient evidence to draw a conclusion), but something really is going on, this is called a Type II Error (Roman Numeral 2). • The probability of this kind of error is called β and it is a real pain to calculate, so we are not going to. • What you need to know about it is that the lower α is, the higher β is.
5% Is A Good Balance • There are really only 2 things we can do to make β smaller. • First, we can make α higher. • This is usually counterproductive, as it just makes us more likely to be wrong the other direction. • Second, we can make n (the sample size) larger. • Larger sample sizes make everything better. • Make the sample size too large, though, and you violate the less than 10% condition.
A Pair of Complements • For a 2-sided test (One where the alternative is a ≠ kind), the compliment of α is the confidence level. • And the compliment of the confidence level is the significance level. • So, 95% confidence matches with 5% (for a ≠ test). • The compliment of β is called statistical power. • You don’t need the formula for this, and only need to know the term in a general way.
Significance • Remember how we discussed the difference between statistical significance and practical significance? • I remember it. • Hopefully we can all remember it. • If the p-value is lower than α, we found sufficient evidence. • Another way to say this is that a “statistically significant difference was found”. • Statistically significant means the H0 got rejected.
Nothing To Prove • Remember how I said earlier that Statistics is never used to prove anything ever? • I remember it. • Hopefully we can all remember it. • We never prove the null or the alternative hypothesis. • We only support the alternative strongly enough or do not support it strongly enough.
A Truckload of Apples • On a recent homework problem you were asked to find out how likely it was that a sample of 150 apples were taken from a whole truckload would have less than 5% damaged. • We were able to do this math because we were told that the true percentage of damaged apples was 8%.
A Truckload of Apples • In the real world we would not know that 8% of the whole truckload was damaged. • They just told us that so we could do the math. • So what should we do in the real world? • Since we don’t know what the true percent is, we will do the next best thing…we will make a wild assumption about the truth.
Wild Assumption • In practice we will make a sensible assumption, actually. • Our assumption will usually be based on something that is known. • This could be past data. • This could be the value from a similar group. • This could be .5 in the case of fair coins. • The purpose in making this assumption is, primarily, to allow us to do the math.
Alternatives • The alternative hypothesis will be based on whatever we are trying to prove. • We use the null hypothesis so we can do the math, but really the alternative hypothesis is what we are trying to focus on when we run a hypothesis test.
Foreshadowing FTW • Tomorrow we will discuss inferior ways to run a hypothesis test. • I say inferior, but they are perfectly acceptable. • Mr. J.P. Damron, in fact, almost always favored a confidence interval method for running hypothesis tests. • It is worth noting that he never once got a hypothesis test wrong with this method.
Assignments • You can read chapter 21, but I do not think it is necessary, especially if you have limited free time. • Chapter 20: 13, 15, 17 • Ignore the directions and simply perform a hypothesis test. • Due Tuesday. • The Chapter 20 Quiz will be Tuesday. • It will be a prompt for a hypothesis test. • You are expected to memorize the 10 steps. • The homework is the quiz practice.
Quiz Quasirubric • 3 points for correct condition checks. (2) • Remember, a solid justification is more important than a “correct” answer. • 3 points for correctly written hypotheses. (3) • 3 points for correct p-value + picture. (4-6) • 3 points for correct decision, based on p-value. (8,9) • 3 points for correct conclusion, based on decision. (10)