350 likes | 517 Views
Statistical Errors in Publications. October 2010. OVERVIEW Greater emphasis on sections dealing with: Design; Sample size; Statistical methodology; Results (Presentation/Interpretation); Discussion/Conclusion. SAMPLE PAPERS
E N D
Statistical Errors in Publications October 2010
OVERVIEW Greater emphasis on sections dealing with: Design; Sample size; Statistical methodology; Results (Presentation/Interpretation); Discussion/Conclusion.
SAMPLE PAPERS Sample 1 – Randomised controlled trial – management of ankle sprains comparing elastic support bandage v. aircast ankle brace (Br. J. Sports Med, 2005); Sample 2 – Study to assess variables which predict chronic neck pain disability (Arch Phys Med Rehabil, 2004).
PREVALENCE OF STATISTICAL ERRORS Concerns of misuse of statistics dating back over 70 years (Altman, 2004) Despite greater awareness (e.g. CONSORT) of statisticalissues such concerns have not diminished
Prevalence of Statistical Errors (cont’d) Serious statistical errors were found in 40% of 164 articles published in psychiatry (Altman, 2002); At least one serious statistical error occurred in 38% and 25% of papers in Nature and BMJ respectively (Garcia-Berthou and Alcaraz (2004)); Many surveys of statistical errors report error rates ranging from 30%-90% (Altman, 1991; Gore et. al., 1976; Pocock et al., 1987 and MacArthur, 1984).
Why are there so many errors? (Altman, 2004) Many investigators are not professional researchers, they are primarily clinicians; Training usually a single course in statistics; Training focuses on data analysis, but issues such as statistical reporting and interpreting are not addressed; Statistical content and complexity of medical research has increasedsteadily over recent decades.
(Altman, 2004) “........ When I tell friends outside medicine that many papers published in medical journals are misleading because of methodological weaknesses they are rightly shocked. Huge sums of money are spent annually on research that is seriously flawed through the use of inappropriate designs, unrepresentative samples, small samples, incorrect methods …”
Personal & Scientific Experience Observe (Natural Course of Disease) Research Planning, Grant Writing, Protocol Development Concept Development Hypothesize (Frame Research Question) Data Collection & Analysis Test (Conduct Experiment/ Clinical Trial) Experimental Design Journal Articles, Scientific Meetings = Process = Stage = Activity Conclude (Validate or Modify Hypothesis) Statistical Inference
DESIGN Population: A population is a group of individuals persons, objects, or items from which samples are taken Sample: A sample is a finite part of a statistical population whose properties are studied to gain information about the whole Sampling: Sampling is the process of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population. Purpose of sampling:To draw conclusions about populations from samples, we must use inferential statistics which enables us to determine a population`s characteristics by directly observing only a portion (or sample) of the population.
Design (cont’d) Sampling error: What can make a sample unrepresentative of its population? One of the most frequent causes is sampling error. Two types of sampling errors: Chance: That is the error that occurs just because of bad luck. Bias: Sampling bias is a tendency to favour the selection of units that have particular characteristics (as a result of poor sampling plan) To avoid sampling error: Plan careful !! select using a random selection of participants
SAMPLE SIZE Sample size may be determined by various practical constraints: Financial Resources Too small a sample is not representative of a population Too large a sample results in wastefulness and is unethical The larger the sample size the more likely the results will reflect what will happen in the population
Sample size (Power Calculation) (cont’d) Difference : Clinically important difference significance threshold: type I error - conventionally set at 0.01 or 0.05 Power: i.e. 1- type II error - conventionally 80% or 90%; How confident you are that the sample will detect a difference, if one really exists in the population Variability: The less variability among patients within each group, the more likely they reflect the overall populations.
Sample size (Power Calculation) (cont’d) Increase in Sample size: Smaller the clinically relevant difference; Increase in power; Less variability; Reduction in Type I error rate Allow for dropouts and/or withdrawals
Sample size (cont’d) Review the two articles in terms of : Design Sample size
Sample size (cont’d) “….A major concern in the design of studies is the almost universal lack of reporting of how the sample size was obtained…..” (Altman, 2000). “…Basis of the power calculation is inadequatelydescribed …” (Malachy, 2004, Vail et al., 2003).(all sample papers) “Quite often sample size calculations are computed without allowing for dropouts” (McGuigan, 1995).(all sample papers)
Sample size (cont’d) Small studies: Small trials have a low power and high type I error No sample size provided, then conclusions of the study have little value(as sample 2) If underpowered then the conclusions to be taken with caution and the results are inconclusive(as sample 1 )
Sample size (cont’d) A description of the sample size in the literature should contain, for example: “ The mean and sd. for the RMQ on the active management is 5.91 and 4.27 respectively (Oxfordshire Low Back Pain trial, BMJ, 2005). The smallest difference between the two therapies which is clinically relevant is approximately 2.0. Using this information, the total number of participants required for this study will be 700, allowing for a 25% loss-to-follow up and using 90% power with a 1% type I error rate (significance level).”
METHODS “................ All of the problems hinge on the understanding of what a statistical test is doing and what a p-value means ....” (Murphy, 2004)
METHODS A Statistical test is a procedure you use to compute a probability in support of the hypothesis (null)
Methods (cont’d) e.g. H0: H1: Test statistic : t-test = The test statistic is transformed into a p-value
Methods (cont’d) P-value: strength of the evidence (quantified by a probability) in support of the null hypothesis. Neither the statistical test nor the p-value PROVE/DISPROVE the null hypothesis – they provide EVIDENCE in support of the null hypothesis.
Methods (cont’d) Review the two articles in terms of : Methods Results (including figures and tables)
Methods (cont’d) “.. A further issue is the copying of incorrect or inappropriate methods. Once incorrect procedures become common, it is hard to stop them from spreading through the medical literature like a genetic mutation..” (Altman, 2002). (as sample 1) “Schwartzer et al. (2000) found that most papers made important errors in the application of new technology such as models for longitudinal data.” (Altman, 2000). (e.g. Hierarchical models in sample 1; ROC curves in sample 2)
Methods (cont’d) Most common errors in Methods section: Failure to check assumption (Nature says that the most common error was not checking for a normal distribution and not stating how normality was tested); Using linear regression analysis without first establishing that the relationship is linear; Ignoring paired or ordered categories and therefore using an inappropriate test; Arbitrarily dividing continuous data into ordinal categories without explanation (“Data dredging”); Multiple comparison (could increase the likelihood of significant result) (sample 2) And many more ……. sub-group analyses, ignoring repeated measures design, non-matched analysis for matched data, modelling incorrectly, i.e. interactions not included …….
Methods (cont’d) Begin a statistical analysis with data exploration; Check assumptions; Type of data – continuous, binary, ordinal, repeated over time, etc. Missing values, outliers, no. of withdrawals; Be careful with computer output (often helps to do simple calculations by hand first).
RESULTS “ ..The results section must be written so that the average reader can understand the study findings” (Cummings, 2003). “… poorly written with excessive jargon …” (Byrne, 2000). (sample 1 and sample 2) “ .. A major bias is cherry-picking results…” (Malachy, 2004).
Results (cont’d) Common Language pitfalls Avoid non-technical uses of technical terms such as “normal”, “significant”, “sample”; “No difference” means “evidence of lack of statistical significant difference”; (Sample 1) p-values - using 2 digit precision (e.g. p = 0.82); Do not reduce p-values to ‘non-significant’ or ‘NS’; Report a quantity so as that it is scientifically relevant (e.g. mean blood pressure of 115.73 mmHg should be reported as 115.7 mmHg or even 116 mmHg)
Results (cont’d) P-values: Over-emphasis on the p-value; An arbitrary division of the results into “significant” and “non-significant” according to the p-value was not the intention of the founders of statistical inference; Smaller p-values indicate a strong evidence against the null hypothesis.
Results (cont’d) Confidence Intervals: A confidence interval is simply a range of values which enclose the population value; Confidence intervals are preferable to p-values, as they tell us the range of possible effect sizes compatible with the data; The larger the sample size the narrower the confidence interval; A confidence interval based on the difference (e.g. treatment difference) and contains a 0, or on a ratio (e.g. odds ratio) and contains a 1, implies lack of evidence of a statistically significant difference.
Results (cont’d) and many more pitfalls ….. testing baseline values (sample 1) ; not reporting missing data; lack of statistical power not considered; misinterpreting and misunderstanding results from models e.g. no interactions included.
PRESENTATION In tables that compare groups include count (of patients or events) and column percentages; Use appropriate statistics (mean instead of median for non-normal data); In tables of column percentages, do not include a row of counts and percentage of missing data (doing this will distort the other percentages in the table); Statistical software packages provide a large amount of output – need to be selective about what is presented; Use graphs as alternative to tables with many entries; do not duplicate graphs and tables. Labelling graphs and tables correctly (sample 1 and sample 2)
INTERPRETATION AND DISCUSSION Put the study sample in context of the population; Interpreting studies with non-significant results and low statistical power as “negative” (when they are inconclusive) “The absence of proof is not proof of absence”; Errors encountered in the design and analysis of a study can also continue through to errors in interpretation (Rushton, 1999); Weaknesses in study design and study strengths stated so that a clear and accurate impression of the reliability of the data can be formed.
And finally….. The misuse of statistics is very important; The need for statisticians to be involved in research at some stage, preferably early as possible; Most errors relatively unimportant; Some can have major bearings on the validity of the study. So…….
“There are three kinds of lies: lies, damn lies and statistics”. Benjamin Disreali.