310 likes | 531 Views
Lecture 2: Null Hypothesis Significance Testing Continued. Laura McAvinue School of Psychology Trinity College Dublin. Null Hypothesis Significance Testing. Previous lecture, Steps of NHST Specify the alternative/research hypothesis Set up the null hypothesis Collect data
E N D
Lecture 2:Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin
Null Hypothesis Significance Testing • Previous lecture, Steps of NHST • Specify the alternative/research hypothesis • Set up the null hypothesis • Collect data • Run the appropriate statistical test • Obtain the test statistic and associated p value • Decide whether to reject or fail to reject the null hypothesis on the basis of p value
Null Hypothesis Significance Testing • Decision to reject or fail to reject Ho • P value • Probability of obtaining the observed results if Ho is true • By convention, use the significance level of p < .05 • Conclude that it is highly unlikely that we would obtain these results by chance, so we reject Ho • Caveat! The fact that there is a significance level does not mean that there is a simple ‘yes’ or ‘no’ answer to your research question
Null Hypothesis Significance Testing • If you obtain results that are not statistically significant (p>.05), this does not necessarily mean that the relationship you are interested in does not exist • There are a number of factors that affect whether your results come out as statistically significant • One and two-tailed tests • Type I and Type II errors • Power
One and Two-tailed Tests • One-tailed / Directional Test • Run this when you have a prediction about the direction of the results • Two-tailed / Non-Directional Test • Run this when you don’t have a prediction about the direction of the results
Recall previous example… • Research Qu • Do anxiety levels of students differ from anxiety levels of young people in general? • Prediction • Due to the pressure of exams and essays, students are more stressed than young people in general • Method • You know the mean score for the normal young population on the anxiety measure = 50 • You predict that your sample will have mean > 50 • Run a one-tailed one-sample t test at p < .05 level
One-tailed Test • Compare the mean of your sample to the sampling distribution for the population mean • Decide to reject Ho if your sample mean falls into the highest 5% of the sampling distribution
Dilemma • But! What if your prediction is wrong? • Perhaps students are less stressed than the general young population • Their own bosses, summers off, no mortgages • With previous one-tailed test, you could only reject Ho if you got an extremely high sample mean • What if you get an extremely low sample mean? • Run a two-tailed test • Hedge your bets • Reject Ho if you obtain scores at either extreme of the distribution, very high or very low sample mean
Two-tailed Test • You will reject Ho when a score appears in the highest 2.5% of the distribution or the lowest 2.5% • Note that it’s not the highest 5% and the lowest 5% as then you’d be operating at p = .1 level, rejecting Ho for 10% of the distribution • So, we gain ability to reject Ho for extreme values at either end but values must be more extreme
Errors in NHST • Howell (2008) p. 157 • “Whenever we reach a decision with a statistical test, there is always a chance that our decision is the wrong one” • Misleading nature of NHST • Because there is a significance level (p = .05), people interpret NHST as a definitive exercise • Results are statistically significant or not • We reject Ho or we don’t • The Ho is wrong or right
Errors in NHST • Remember we are dealing with probabilities • We make our decision on the basis of the likelihood of obtaining the results if Ho is true • There is always the chance that we are making an error • Two kinds of Error • We reject Howhen it is true (Type I error) • We say there’s a significant difference when there’s not • We accept Ho when it is false (Type II error) • We say there is no significant difference when there is
Type I Error • Our anxiety example • Predict that students will have greater anxiety score than young people in general • Test Ho that students’ anxiety levels do not differ from young people • One-tailed one sample t-test at p < .05 • Compare sample mean with sampling distribution of mean for the population (Ho)
Type I Error • Decide to reject Ho if your sample mean falls in the top 5% of the distribution • But! • This 5%, even though at the extreme end, still belongs to the distribution • If your sample mean falls within this top 5%, there is still a chance that your sample came from the Ho population
Type I Error • For example, if p = .04, this means that there is a very small chance that your sample mean came from that population, • But thisis still a chance, you could berejecting Ho when it is in fact true • Researchers are willing to accept this small risk (5%) of making a Type I error, of rejecting Ho when it is in fact true • Probability of making Type I error = alpha =the significance level that you chose • .05, .01
Type II Error • So why not set a very low significance level to minimise your risk of making a Type I error? • Set p < .01 rather than p < .05 • As you decrease the probability of making a Type I error you increase the probability of making a Type II error • Type II Error • Fail to reject Ho when it is false • Fail to detect a significant relationship in your data when a true relationship exists
For argument’s sake, imagine that H1 is correct • Sampling Distribution under Ho • Sampling Distribution under H1 • Reject Ho if sample mean equals any value to the right of the critical value (red region) • Correct Decision • Accept Ho if sample mean equals any value to the left of the critical region • Type II Error
Power • You should minimise both Type I and Type II errors • In reality, people are often very careful about Type I (i.e. strict about ) but ignore Type II altogether • If you ignore Type II error, your experiment could be doomed before it begins • even if a true effect exists (i.e. H1 is correct), if is high, the results may not show a statistically significant effect • How do you reduce the probability of a Type II error? • Increase the power of the experiment
Power • Power • The probability of correctly rejecting a false Ho • A measure of the ability of your experiment to detect a significant effect when one truly exists • 1 -
How do we increase the power of our experiment? • Factors affecting power • The significance level () • One-tailed v two-tailed test • The true difference between Ho and H1(o - 1) • Sample Size (n)
The Influence of on Power • Reduce the significance level ()… • Reduce the probability of making a Type I error • Rejecting the Ho when it is true • Increase the probability of making a Type II error • Accepting the Ho when it is false • Reduce the power of the experiment to detect a true effect as statistically significant
Increase and increase power But! You increase the probability of a Type I error!
The Influence of One v Two-tailed Tests on Power • We lose power with a two-tailed test • power is divided across the two tails of the experiment • Values must be more extreme to be statistically significant
The Influence of the True Difference between Ho and H1 • The bigger the difference between o and 1, the easier it is to detect it
The Influence of Sample Size on Power • The bigger the sample size, the more power you have • A big sample provides a better estimate of the population mean • With bigger sample sizes, the sampling distribution for the mean clusters more tightly around the population mean • Standard deviation of the sampling distribution, known as standard error the mean is reduced • There is less overlap between the sampling distributions under Ho and H1 • The power to detect a significant difference increases
Sample Size Exercise • Open the following dataset • Software / Kevin Thomas / Power dataset (revised) • Explores the effects of Therapy on Depression • Perform two Independent Samples t-test • Analyse / Compare means / Independent Samples t test • Group represents Therapy v Control • Score represents post-treatment depression • 1. Group1 & Score1 • 2. Group 2 & Score 2
So, how do I increase the power of my study? • You can’t manipulate the true difference between Ho and H1 • You could increase your significance level () but then you would increase the risk of a Type I error • If you have a strong prediction about the direction of the results, you should run a one-tailed test • The factor that is most under your control is sample size • Increase it!