160 likes | 170 Views
This article discusses the need for changes in how we conduct research, including addressing threats to research integrity and shifting from Null Hypothesis Significance Testing (NHST). It explores new solutions such as estimation, effect sizes, and meta-analysis.
E N D
The New Statistics:Why & HowCorey Mackenzie, Ph.D., C. Psych
http://www.latrobe.edu.au/scitecheng/about/staff/profile?uname=GDCumminghttp://www.latrobe.edu.au/scitecheng/about/staff/profile?uname=GDCumming http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci
Outline • Need for changes to how we conduct research • Three threats to research integrity • Shift from Null Hypothesis Sig Testing (NHST) • 3 “new” solutions • Estimation • Effect sizes • Meta-analysis
1st change to how we do research: Enhance research integrity by addressing three threats
Threat to Integrity #1 • We must have complete reporting of findings • Small or large effects, important or not • Challenging because journals have limited space and are looking for novel, “significant” findings • Potential solutions • Online data repositories • New online journals • Open-access journals
Threat to Integrity #2 • We need to avoid selection and bias in data analysis (e.g., cherry picking) • How? • Prespecified research in which critical aspects of studies are registered beforehand • Distinguishing exploratory from prespecified studies
Threat to Integrity #3 • We need published replications (ideally with more precise estimates than original study) • Key for meta-analysis • Need greater opportunities to report them
2n change to how we do research: stop evaluating research outcomes by testing the null hypothesis
Problems with p-values In April 2009, people rushed to Boots pharmacies in Britain to buy No. 7 Protect & Perfect Intense Beauty Serum. They were prompted by media reports of an article in the British Journal of Dermatology stating that the anti-ageing cream “produced statistically significant improvement in facial wrinkles as compared to baseline assessment (p = .013), whereas [placebo-treated] skin was not significantly improved (p = .11)”. The article claimed a statistically significant effect of the cream because p < .05, but no significant effect of the control placebo cream because p > .05. In other words, the cream had an effect, but the control material didn’t.
Problems with NHST • Kline (2004) What’s Wrong with Stats Tests • 8 Fallacies about null hypothesis testing • Encourages dichotomous thinking, but effects come in shades of grey • P = .001, .04, .06, .92 • NHST is strongly affected by sample size
Solution #1 • Support for Bill 32 is 53% in a poll with an error margin of 2% • i.e., 53 (51-55 with 95% confidence) vs • Support is statistically significantly greater than 50%, p < .01
Solution #2 • http://en.wikipedia.org/wiki/Effect_size • http://lsr-wiki-01.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize • G*Power
Solution #3 • Meta-analysis • P-values have no (or very little) role except their negative influence on the file-drawer effect • Overcomes wide confidence intervals often given by individual studies • Can makes sense of messy and disputed research literatures
Why do we love P? • Suggests importance • We’re reluctant to change • Confidence intervals are sometimes embarrassingly wide • 9 ±12 • But this accurately indicates unreliability of data
Why might we change? • 30 years of damning critiques of NHST • 6th edition of APA publication manual • Used by more than 1000 journals across disciplines • Researchers should “wherever possible, base discussion and interpretation of results on point and interval estimates” • http://www.sagepub.com/journals/Journal200808/manuscriptSubmission