280 likes | 458 Views
Lecture 3: Null Hypothesis Significance Testing Continued. Laura McAvinue School of Psychology Trinity College Dublin. Previous Lectures. Inferential Statistics Sample Population Null Hypothesis Significance Testing Proceeds in series of steps
E N D
Lecture 3:Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin
Previous Lectures • Inferential Statistics • Sample Population • Null Hypothesis Significance Testing • Proceeds in series of steps • Allows us to assess the statistical significance of our results • To reject or accept the Ho on the basis of the p value
Previous Lectures • Misleading nature of statistical significance • Results can be labelled as • ‘Statistically significant’ • ‘Not statistically significant’ • People interpret results in a cut and dried fashion • ‘Statistically significant result means there is a true effect in the population’ • ‘Non-significant result means there is no true effect’
Previous Lectures • NHST is not so straightforward • Statistical significance is affected by • One or two tailed test • Significance level / / probability of Type I error • Power / Probability of Type II error • Sample size • These factors must be considered • Research evaluation • Research planning
Research Evaluation • A result is statistically significant • Implies a true effect exists in the population • But is this effect clinically significant? • How big if the effect? • Real world relevance? • Recall that a large enough sample size will make a small effect statistically significant
Research Evaluation • A result is not statistically significant • Implies a true effect does not exist in the population • Power • Did the study have enough power to identify an effect as statistically significant even if a true effect existed?
Research Planning • Power • Require enough power to obtain statistically significant results if a true effect exists • Sample Size • Obtain an adequate sample size
Effect Size • NHST • Enables us to say whether or not a true effect exists in the population • Effect Size • Provides an estimate of the size of this true effect • A measure of the degree to which the Ho is false • A measure of the discrepancy between Ho and H1
0 0 1 1 Small ES 0 - 1 = small ES Large ES 0- 1= large ES
Effect Size • There is a different effect size measure for each statistical test • The difference between two independent group means • Cohen’s d • 1 - 0 σ • Standardised difference • Express the difference between the means in terms of the standard deviation
Effect Size • To calculate Cohen’s d for a study in which you compared two groups Meantreat – Meancontrol SDcontrol • For example, I compared the effects of an exercise regime and a control regime on physical fitness (rated /20) in two groups and obtained the following results…
Effect Size • Mean rating in exercise group was 17 (SD = 10) • Mean rating in control group was 11 (SD = 10) • Cohen’s d was 17 – 11 10 = .6 • The exercise group had a mean rating .6 SDs higher than the control group • You can use Cohen’s d to compare studies that have used different measures
Comparing Studies • Four studies examined the effect of cognitive behavioural therapy on self-esteem but each study used a different scale to assess self-esteem. • Calculate the effect size for each of the following studies • Which study found the greatest effect?
Comparing Studies • Four studies examined the effect of cognitive behavioural therapy on self-esteem but each study used a different scale to assess self-esteem. • Calculate the effect size for each of the following studies • Which study found the greatest effect?
What is a big Effect Size? • Cohen’s (1992) rules of thumb • For independent t-tests comparing two means… Cohen, J. (1992). A power primer. Psychological Bulletin, 112 (1), 155-159.
Research Evaluation • A statistically significant result • Is it clinically significant? • Real world relevance? • Effect Size • A non-significant result • No true effect? • Lack of power?
Calculating Power • Recall that power is determined by a number of factors • To calculate the power of an experiment you need to know • One or two-tailed test • Significance level • Sample size • Effect size • You calculate the power of an experiment to identify a certain effect size as statistically significant, using a one/two-tailed test with a certain level and a certain sample size
The difference in power for these two studies was due to sample size
Power • Computer programmes can calculate power • http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ • Free download of gpower3 package • Research planning • Rather than computing power post hoc, best to plan to have adequate power to obtain statistically significant results if Ho is false and a true effect exists • Convention • Aim for power of .8 • 80% chance of obtaining significant results if Ho is false • .2 probability of Type II error • 1 : 4 ratio of Type I (.05) to Type II (.2) errors
Power & Sample Size • Main avenue for increasing power • Increase sample size • Common question • How big a sample do I need? • Answer depends • The power you want to have • Significance level you set • Effect size you expect to obtain • Statistical test you are running • One or two tailed prediction
Power & Sample Size • The Real Question • “What sample size do I need to have power of ____ to detect an ES of ____ as being statistically significant at ____ level, when doing a ____ statistical test and making a ____-tailed prediction?” • Most of the gaps are easy to complete • Power = .8 • = .05 • Test = depends on experimental design • Prediction = depends on theory • ES = ? • Need to estimate effect size
Estimate Effect Size • Pilot Study • Do analysis on small group to give idea of results • Previous Research • Calculate ES in previously published studies • Theory • Based on theory or understanding of research area, estimate the ES or the smallest ES that would be of interest • Cohen’s Standards • Would you like to detect a small, medium or large effect? • Difference between two groups • Small (.2), Medium (.5), Large (.8)
Power & Sample Size • Once you have decided on the following • Statistical test, prediction, Power, and ES • You can calculate necessary sample size in two ways • Computer package, such as gpower3 • Cohen’s tables • Let’s try an example • Turn to the handout showing Cohen’s table of required sample size • (note that this table refers to two-tailed predictions)
Calculating Required Sample Size • I would like to investigate the difference between clinically anxious and normal people in relation to performance on an attention task • “How many people do I need in each group to have power of .8 to detect a large ES as being statistically significant at .05 level, when doing an independent samples t-test and making a two-tailed prediction?”
Cohen’s Table N for Small, Medium, and large ES at power = 0.80 for = .01, .05 and .10 • We need 26 people in each group to have a power of 0.80 to detect a large ES as statistically significant at the 0.05 level
Some more practice! • For a two group independent t-test, how many people do I need in each group to detect… • Large ES as statistically significant at .10 level _________ • Large ES as statistically significant at .05 level _________ • Large ES as statistically significant at .01 level _________ • Medium ES as statistically significant at .01 level _________ • Small ES as statistically significant at .01 level _________ • The smaller the alpha level, the _______________ the sample size required to detect a given difference as being statistically significant • The smaller the ES, the _______________ the sample size required to detect a given difference as being statistically significant
Summary • Factors affecting Statistical Significance • Research Evaluation • Effect size • Power Calculations • Research Planning • Sample Size Calculations