Hypothesis Testing

Hypothesis Testing • Chapter 8 of Howell • How do we know when we can generalize our research findings? • External validity must be good • must have statistical significance • We need a way to ensure that the effects we find actually exist in the population, and not just in the sample • Hypothesis testing does this

The need for hypothesis testing • Imagine a study to check relationship between smoking and cancer • We find that, in our sample, people who smoke more have a higher probability of getting lung cancer • Can we believe that result applies to the population? • No - might have been a fluke (all happened to get cancer due to something else)

Random variation • Sometimes odd things happen to us • Your car breaks in the middle of the road, your cellphone runs out of batteries and it starts to rain all at once • Some things happen for a reason, some happen by just luck • We tend to think that there is a reason for everything, but random variation can “cause” things to happen

Determining if random variation played a part • The chance of my car breaking is about 10% • could reasonably happen • The chance of my phone dying is about 5% • might reasonably happen • The odds of it raining are about 10% • could happen • BUT the odds of it all happening at the same time is 0.05% (ie 1 in 2000)

More random variation • If something has a 1 in 2000 chance of happening, and it does, it probably isn’t a fluke • The odds are so small, it is more reasonable to find another explanation • The same principle applies to testing scientific hypotheses • What are the odds that the event we witnessed in our sample could have happened due to random variation?

The hypothesis test • The general strategy: • 1. Work out the statistic (correlation, etc) • 2. Consider the conditions we collected our sample under (how many people, etc) • 3. Calculate the probability that, given those conditions, that statistic could have occurred by random variation • 4. If the odds are low, we reject the notion that random variation “caused” it

How to work out p for a hypothesis test • In a hypothesis test, p is the probability that random variation “caused” the event • Generally the same principle as for working out p for z • Each statistic has a sampling distribution associated with it • We use this to work out p, same way we used the z distribution to work out p.

The Null Hypothesis • Each hypothesis test tries to show that random variation did not cause the event • Shorthand of “random variation caused the event” is Ho - the Null hypothesis • says, “nothing actually happened” • The aim of a hypothesis test is to decide if Ho is so unlikely that we should reject it as an explanation for our results • p is actually “probability that Ho is true”

Degrees of freedom • p depends on a few things • Sample size • Number of groups in the design • We express these conditions by calculating the degress of freedom of the statistic • Often very simple, eg. df = n-1

How unlikely is unlikely enough? • We need to decide how low a p value we are going to accept • How low does the probability of random variation have to be for us to reject that idea? • That level is called alpha () • it is usually set at 0.05 (5% chance) • “if the chance that random variation caused this event are less than alpha (5%), then it must have been something else”

Working out p • Awesomely difficult maths • Get a computer to do it for you • Once you have the p value, compare it to your alpha value • if p is less than alpha then Ho is false (random variation did not play a part) • if p is more than or equal to alpha, Ho is true (random variation might have played a part - can’t rule it out)

Hypothesis testing: step by step • Step 1: decide on alpha, set out Ho • alpha is normally 0.05 • Ho is different for each stat, same basic idea “nothing happened” • Step 2: work out the stat, work out p • Step 3: compare p to alpha, decide if Ho should be rejected • if p is less than alpha, reject Ho

Worked example: results from a computer • You do a correlation, and get the following analysis: • r = -0.3 p = 0.04 alpha = 0.05 • Are these results statistically significant?

Worked example • The computer has done steps 1 and 2 for you (you have the stat and p) • You just do the step 3 - the decision (is p < alpha?) • p (0.04) < alpha (0.05), so reject Ho - the results is statistically significant (ie. occurs in the population, not just the sample)

What have we discovered? • What does it mean to find statistical significance? • The relationship is likely to occur in the population - that it was not a fluke • Tells us nothing about whether the relationship is positive or negative, or how strong it is • How strong the relationship is, is expressed by the effet size - a different concept.

Error in hypothesis testing • No hypothesis test gives a 100% sure result • Out alpha sets our level of “confidence” - 0.05 means we are 5% sure we will make a mistake • 2 kinds of of mistakes you can make: • Type I error: you say Ho is false, when it’s actually true • Type II error: you say Ho is true, when it’s actually false

Errors: Example • Imagine you are a judge, hearing a case. • You are presented the evidence • If the guy did it, you must declare him “guilty” • If he didn’t, you must declare him “not-guilty” • Declaring someone who didn’t do it “guilty” is also a mistake (innocent man in jail) • Declaring someone who did it “not-guilty” is a mistake!(the guy gets off)

Example • Same idea in hypothesis testing • You are presented the data • If the Ho is false, you must reject it • If the Ho is true, you must not reject it • Rejecting a true Ho is a mistake • Not rejecting a false Ho is also a mistake

Error types again • Type I error is “putting an innocent man in prison” • Type II error is “the crook getting away with it” • We are interested in the probability of making one of these errors • Cannot be avoided, only reduced • Probability of making a Type I error is alpha

Hypothesis Testing