1 / 42

Type I and II errors

Type I and II errors. Ana Jerončić. What is a p value?. P value is a short form for probability value P=0.07=7% There is 7% probability that we will incounter such or more extreme differences by chance . OR

asher-rose
Download Presentation

Type I and II errors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Type I and II errors Ana Jerončić

  2. What is a p value? P value is a short form for probabilityvalue P=0.07=7% • There is 7% probabilitythatwewillincountersuchor more extremedifferencesbychance. OR • Incasewhenno real effectexsistsifwerepeatexperiment a 100 times, suchdifference (or more extreme) wouldbefoundin7 experiments.

  3. What is a p value? P value is a short form for probabilityvalue P=0.99=99% • There is 99% probabilitythatwewillincountersuchor even more extremedifferencesbychance. OR • Incasewhenno real effectexsistsifwerepeatexperiment a 100 times, suchdifference (or more extreme) wouldbefoundin99 experiments.

  4. What is a significancelevelα?

  5. Significant difference between the treatments Null hypothesis is rejected, alternative is accepted Interpretation of P-value (0.05) P<0.05 5% No difference between the treatments (observed difference having happened by chance) Null hypothesis is accepted P>=0.05

  6. What is a significancelevelα? The thresholdofP-value that determines whento reject a null hypothesis It refers to the chance that you are willing to take in being wrongie.inconcludingthatthere is a substantialdifferencewhenthere is none.

  7. What is a significancelevelα? The most common significance level:α=0.05=5% • We want to risk that only 5% of our predictions are wrong.

  8. = Alpha=0.05 Outof 40 decisions=> wecouldexpectthat2 are wrong

  9. What is  (Type I error)? • α is alsocalledType I error • The probability of erroneously rejecting the null hypothesis Consequenceoftype I error • Put an useless medicine into the market!

  10. Watchout for…

  11. Examplefromthe literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732 • The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group

  12. Examplefromthe literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732 • The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group

  13. Examplefromthe literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait DanainComplexPedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005 • …. Thehigher-degreeRR was deemed significantly better if the P-value for thehigher-degreemodel was 0.01. …..

  14. Examplefromthe literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait DanainComplexPedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005 • …. Thehigher-degreeRR was deemed significantly better if the P-value for thehigher-degreemodel was0.01. …..

  15. Example: TheBrain-DerivedNeurotrophicFactor val66met PolymorphismandVariationin Human CorticalMorphologyLukasPezawas, Beth A. Verchinski, et al. • Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.

  16. Example: TheBrain-DerivedNeurotrophicFactor val66met PolymorphismandVariationin Human CorticalMorphologyLukasPezawas, Beth A. Verchinski, et al. • Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.

  17. What is  (Type II error)? • The probability of erroneously failing to reject the null hypothesis. • The most commonβ = 0.2 Consequenceoftype I error • Keep a good medicine away from patients!

  18. What is Power ? • Power quantifies the ability of the study to find true differences. • Power = 1- =P (accept H1givenH1 is true) • the probability of correctly identifing H1 (correctly identify a better medicine) Ifβ=0.2, power=0.8=80%

  19. Example Studies with the drug X have shown that usage of drug X induces very serious side effects. Therefore drug X was with-drawn from the market. New alternative drugY was examined and the reduction in harmful effects, compared to drug X, was observed. What is the significance level that you will use to evaluate the significance of reduction in harmful effects of drug Y, compared to drug X?

  20. Example Theeffectofalcohol on thedriver’s reaction time wasinvestigated on a simplerandomsample. Observedreactiontimes, beforeandafterthealcoholintake, haveshowntheincreaseinaveragereaction time afterthealcoholintake. What is the significance level that you will use to evaluate the significance of increaseinreaction time?

  21. The choice of  and  depends on: • the medical and practical consequences of the two kinds of errors • the desired impact of the results

  22. The choice of  and  • < (the mostcommon approach  =0.05 and  =0.2) ie. if the control treatment is already widely used and is known to be reasonably safe and effective, whereas the test treatment is new,costly, or produces serious side effects. • > ie. if there is no established control treatment and test treatment is relatively inexpensive, easy to apply and is not known to have any serious side effects.

  23. The choice of  and  Choicesotherthan =0.05 and  =0.2 • =0.10 and =0.2 for preliminary trials that are likely to be replicated. • =0.01 and =0.05 for the trial that are unlikely replicated.

  24. Significancelevel at court A company who used to develop a clot-busting product in the indication of occluded central venous catheter - Nuvelo Pharmaceuticals was sewed by their investors for setting extraordinarily small significance level α=0.00125 http://onbiostatistics.blogspot.com/2010/01/significant-level-of-000125.html

  25. Power calculation

  26. What is the powerofthestudy? • Power quantifies the ability of the study to find true differences. • Power = 1- =P (accept H1givenH1 is true) the probability of correctly identifing H1 (correctly identify a better medicine) Ifβ=0.2, power=0.8=80%

  27. What is delta ()? •  is the minimum differencebetween groups that is judged to be clinically important • Minimal effect which has clinical relevance in the management of patients or • The anticipated effect of the new treatment

  28. Power Calculation(assuming we compare two medicines) Power Depends on 4 elements: • The real difference between the two medicines,  Big big power • The variation among individuals, Small big power • The sample size, n Large nbig power • Type I error, Large  big power

  29. Samplesize

  30. Samplesizeand, , and • N • The power 1- N  • The N 

  31. Sample Size “How large a sample do I need?” -Very commonly asked -Important question -Answer not so simple Statistical power calculations -Usestatisticalsoftware or graphicalmethod -Depends on data type

  32. Interpret theresults Braga L, Byrne R, Lorenzo A et al. Methodological quality assessment of RCTs in hypospadias literature. 23rd Annual ESPU Congress - Zurich, Switzerland - 2012 • Analyses showed that publication after 2006 (p<0.01), RCT sample size >50 (p=0.03), significance level α=0.01 (p<0.01) and blinding of outcome assessor (p<0.01) were significantly associated with better quality of RCTs. Hypospadias is a birth defect of the urethra in males

  33. Interpret theresults Weir R. Randomised controlled trial to meta-analysis ratio: a reply from a group producing systematic reviews. 2007. The New Zel Med Journal 120, 1-3 Antman et al showed that recommendations for routine use of thrombolytic therapyfirst appeared in 1987, 14 years after a statistically significant reduction in mortalitywas apparent on a subsequent cumulative meta-analysis of all relevant RCTs. At the first time a significant reduction in mortality was apparent in the cumulativemeta-analysis of IV streptokinase therapy (1973, p=0.01), 2432 patients had beenrandomised in eight small trials. The results of a further 25 studies (34,542 additionalpatients) published before routine recommendation of thrombolytic therapy, reducedthe significance level to p=0.001 in 1979 and p=0.0001 in 1986.

  34. Based on theresultspresentedintheabstract – writedownconclusionsection

  35. Little P, Moore M, et al. Ibuprofen, paracetamol, and steam for patients with respiratory tract infections in primary care: pragmatic randomised factorial trial. BMJ 2013 Oct 25;347:f6041 CONCLUSION: • Overall advice to use steam inhalation, or ibuprofen rather than paracetamol, does not help control symptoms in patients with acute respiratory tract infections and must be balanced against the possible progression of symptoms during the next month for a minority of patients. Advice to use ibuprofen might help short term control of symptoms in those with chest infections and in children.

  36. Muraki I, Imamura F, et al. Fruit consumption and risk of type 2 diabetes: results from three prospective longitudinal cohort studies. BMJ. 2013 Aug 28;347:f5001 CONCLUSION: • Our findings suggest the presence of heterogeneity in the associations between individual fruit consumption and risk of type 2 diabetes. Greater consumption of specific whole fruits, particularly blueberries, grapes, and apples, is significantly associated with a lower risk of type 2 diabetes, whereas greater consumption of fruit juice is associated with a higher risk.

  37. HuseyinNaci, John P A Ioannidis et al. Comparative effectiveness of exercise and drug interventions on mortality outcomes: metaepidemiological study. BMJ 2013; 347 • Conclusions Although limited in quantity, existing randomised trial evidence on exercise interventions suggests that exercise and many drug interventions are often potentially similar in terms of their mortality benefits in the secondary prevention of coronary heart disease, rehabilitation after stroke, treatment of heart failure, and prevention of diabetes.

  38. Sanjay Basu et al. Palm oil taxesandcardiovasculardiseasemortalityin India: economic-epidemiologic model, BMJ. 2013 Oct 22;347; Conclusions Curtailing palm oil intake through taxation may modestly reduce hyperlipidemia and cardiovascular mortality, but with potential distributional consequences differentially benefiting male and urban populations, as well as affecting food security.

More Related