300 likes | 424 Views
Significance testing. Ioannis Karagiannis (based on previous EPIET material) 18 th EPIET /EUPHEM Introductory course 28 . 09 .201 2. The idea of statistical inference. Generalisation to the population. Conclusions based on the sample. Population. Hypotheses. Sample.
E N D
Significance testing Ioannis Karagiannis(based on previous EPIET material) 18thEPIET/EUPHEM Introductory course 28.09.2012
The idea of statistical inference Generalisation to the population Conclusions based on the sample Population Hypotheses Sample
Inferential statistics • Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness • Two basic approaches: • Hypothesis testing • Estimation • Common goal: conclude on the effect of an independent variable on a dependent variable
The aim of a statistical test To reach a deterministic decision (“yes” or “no”) about observed data on a probabilistic basis.
Why significance testing? Norovirus outbreak on a Greek island: “The risk of illness was higher among people who ate raw seafood (RR=21.5).” Is the association due to chance?
The two hypotheses When you perform a test of statistical significance,you reject or do not reject the Null Hypothesis (H0)
Norovirus on a Greek island • Null hypothesis (H0): “There is no association between consumption of raw seafood and illness.” • Alternative hypothesis(H1): “There is an association between consumption of raw seafood and illness.”
Hypothesis testing • Tests of statistical significance • Data not consistent with H0 : • H0 can be rejected in favour of some alternative hypothesis H1 (the objective of our study). • Data are consistent with the H0: • H0 cannot be rejected You cannot say that the H0 is true. You can only decide to reject it or not reject it.
p value p value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis H0 rejected using reportedpvalue
p values – practicalities Low p values = low degree of compatibility between H0 and the observed data: association unlikely to be by chance you reject H0, the test is significant High p values= high degree of compatibility between H0 and the observed data: association likely to be by chance you don’t reject H0, the test is not significant
Levels of significance – practicalities We need of a cut-off ! 1% 5% 10% p value > 0.05 = H0 not rejected (non significant) p value ≤ 0.05 = H0 rejected (significant) BUT: Give always the exact p-value rather than „significant“ vs. „non-significant“.
Examples from the literature • ”The limit for statistical significance was set at p=0.05.” • ”There was a strong relationship (p<0.001).” • ”…, but it did not reach statistical significance (ns).” • „ The relationshipwasstatisticallysignificant (p=0.0361)” p=0.05 Agreed convention Not an absolute truth ”Surely, God loves the 0.06 nearly as much as the 0.05” (Rosnow and Rosenthal, 1991)
p = 0.05 and its errors • Level of significance, usually p = 0.05 • p value used for decision making But still 2 possible errors: • H0should not be rejected, but it was rejected : • Type I or alpha error • H0should be rejected, but it was not rejected : Type II or beta error
Types of errors Truth No diff Diff Decision basedon thep value No diff Diff • H0 is “true” but rejected: Type I or error • H0 is “false” but not rejected: Type II or error
More on errors • Probability of Type I error: • Value of α is determinedinadvance of the test • The significancelevel is thelevel of αerrorthatwewouldaccept (usually 0.05) • Probability of Type II error: • Value of βdependsonthesize of effect (e.g. RR, OR) and samplesize • 1-β: Statistical power of a studyto detect an effect on a specified size (e.g. 0.80) • Fix β in advance: choose an appropriate sample size
Quantifying the association • Test of association of exposure and outcome • E.g. chi2 test or Fisher’s exact test • Comparison of proportions • Chi2value quantifies the association • The larger the chi2value, the smaller thep value • the more the observed data deviate from the assumption of independence (no effect).
Norovirus on a Greek island2x2 table Expected number of ill and not ill for each cell : Ill Non ill x19% ill Raw seafood 38 x 81% non-ill 31 6 No raw seafood x19% ill 141 x 81% non-ill 27 114 34 145 179 Expected proportion of ill and not ill : 19 % 81%
Chi-square calculation χ2= 125 p < 0.001 Ill Non ill Raw seafood 38 No raw seafood 141 34 145 179
Norovirus on a Greek island “The attack rate of illness among consumers of raw seafood was 21.5 times higher than among non consumers of these food items (p<0.001).” The p value is smaller than the chosen significance level of α = 5%. →The null hypothesis is rejected. There is a < 0.001 probability (<1/1000) that the observed association could have occured by chance, if there were no true association between eating imported raw seafood and illness.
C2012 vs facilitators The ultimate (eye) test. H0: the proportion of facilitators wearing glasses during the Tuesday morning sessions was equal to the proportion of fellows wearing glasses. H1: the above proportions were different.
C2012 vs facilitators Expected number of ill and not ill for each cell : Glasses No glasses x33% +ve Fellow 38 x67% -ve 25 13 Facilitator x33% +ve 14 x67% -ve 4.6 9.4 17 35 52 Expected proportion of ill and not ill : 33% 67%
Chi-square calculation χ2= 1.11 p= 0.343 Glasses No glasses Fellow Facilitator
t-test • Used to compare means of a continuous variable in two different groups • Assumes normal distribution
t-test • H0: fellows with glasses do not tend to sit further in the back of the room compared to fellows without glasses • H1: fellows with glasses tend to sit further in the back of the room compared to fellows without glasses
Criticism on significance testing “Epidemiological application need more than a decision as to whether chance alone could have produced association.” (Rothman et al. 2008) Estimation of an effect measure(e.g. RR, OR) rather than significance testing.
Suggested reading • KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008 • SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988 • SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999 • C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001
Previous lecturers • Alain Moren • Paolo D’Ancona • Lisa King • Ágnes Hajdu • Preben Aavitsland • DorisRadun • Manuel Dehnert