270 likes | 388 Views
Ecole Nationale Vétérinaire de Toulouse. Statistics : the ten main mistakes. Didier Concordet d.concordet@envt.fr. July 2005. Statistical mistakes are frequent. • Many surveys of statistical errors in the medical literature
E N D
Ecole Nationale Vétérinaire de Toulouse Statistics : the ten main mistakes Didier Concordet d.concordet@envt.fr July 2005
Statistical mistakes are frequent • • Many surveys of statistical errors in the medical literature • with error rates ranging from 30%-90% (Altman, 1991; Gore et. al.,1976; Pocock et. al., 1987 and MacArthur, 1984) • Reviews of the biomedical literature have consistently found that about half the articles use incorrect statistical methods (Glantz, 1980)
When do they occur ? • When designing the experiment • When collecting data • When analysing data • When interpreting results
Design • Lack of a proper randomisation • the inference space is not defined • poor balance of the groups to be compared • lack of control group (maybe les frequent now) • there exist confounding factors • Lack of power • the sample size is not large enough to answer the question • the statistical unit is not well defined
Inference space definition (M1) An experiment in 2 years old beagles showed that the temperature of dogs treated with the antipyretic drug A decreased by 2 °C. Does this result still hold for all 2 years old beagles 3 years olds beagles beagles dogs man
Poor balance (M2) Clinical trial comparison of 2 antipyretics rectal temperature after treatment REFERENCE New TRT X = 39 N = 100 SD = 1 X = 37 N = 100 SD = 1 Reference < New TRT (P<0.001)
Poor balance Clinical trial comparison of 2 antipyretics rectal temperature after treatment Clinical trial 1 Clinical trial 2 REFERENCE New TRT REFERENCE New TRT X = 30 N = 10 SD = 1 X = 32 N = 50 SD = 1 X = 40 N = 90 SD = 1 X = 42 N = 50 SD = 1 New TRT < Ref P<0.001 New TRT< Ref P<0.001 Conclusion : Reference > New TRT
Power (M3) A clinical study to compare efficacy of two treatments (Ref. and Test) For the efficacy variable Expected difference between the treatments = 4 SD 2. A parallel two groups design is planned with 5 dogs in each groups What to think about this study ? 35 % of power for a type I risk of 5% Even if the expected difference exists, only 35% of the samples (of size 5)of dogs actually exhibits it !
N 5 5 Power Efficacy variable on two groups of dogs Ref Test 20.0 Mean 15.4 2.6 2.4 SD Student t-test :P = 0.18 Actually no conclusion
A real story A study was performed in order to study the effect of diet on several biochemical compounds (about 20). To this end, a dog was fed with a "normal" diet during 3 months and then with the new diet during 3 months. Every two days, a blood sample was taken and the biochemical compounds were dosed. At the end of the experiment 90 data were available for each biochemical compound. There was a significant difference between the effects of the two diets for 10 biochemical compounds (P<0.001). This result was obtained with a sample size of 90
Statistical unit (M4) The statistical unit (an individual) is a statistical object that cannot be divided. We want to generalise results obtained on a finite collection of units (a sample) to a population of units. Despite the appearance of "wealth", the sample size was equal to 1 not 90. At the end of the experiment, the only dog of the experiment was well known but what about the other dogs of the population ?
Experiment • Missing data not adequately reported • Extreme values excluded • Data ignored because they did not support the hypothesis ?
Analysis • Failure to check assumptions of the statistical methods (M5) • homoscedasticity (for a t-test, a linear regression,…) • using a linear regression without first establishing linearity… • correlation • Ignoring informative "missing" data • death and its consequences • data below LOQ • Choosing the question to get an answer • Multiple comparisons
What the t-test can see Homoscedasticity (M5) t-test P-value = 0.56 After log-transf P-value = 0.026 Clearance Treatment 2 1
Linear regression Linear regression Linearity/Correlation (M5) Correlation R = -0.93 Correlation R = -0.002
Linear regression A linear model with 3 groups Linearity/Correlation Correlation R = 0.84 Within group Correlation R = -0.92
Choosing the question to get an answer (M7) Occurs frequently in the presentation of clinical trials results The question becomes random : it changes with the sample of animals. The question is chosen with its answer in hands… Think about a flip coin game where you win 1€ when tail or head occurs. You choose the decision rule once you know the result of the flip ! Such an approach increases the number of false discoveries.
A risk of 5% for each comparison : the global risk can be very large Multiple comparisons (M8) One wants to compare the ADG obtained with 5 different diets in pig Ten T-tests
Interpretation/presentation • Standard error and standard deviation • P values : non significant effects • False causality
Standard error / standard deviation (M9) The clairance of the drug was equal to 68 ± 5 mL/mn Two possible meanings depending on the meaning of 5 If 5 is the standard error of the mean (se) there is 95 % chance that the population mean clearance belongs to [68 - 2 5 ; 68 + 2 5 ] If 5 is the standard deviation (SD) 95 % of animals have their clearance within [68 - 2 5 ; 68 + 2 5 ]
NO P values (M10) The difference between the effect of the drugs A and B is not significant (P = 0.56) therefore drug A can be substituted by drug B. The only conclusion that can be drawn from such a P value is that you didn't see any difference between the effect of the drugs A and B. That does not mean that such a difference does not exist. Absence of evidence is not evidence of absence
NO P values (M10) The drug A has a higher efficacy than the drug B (P = 0.001) The drug C has a higher efficacy than the drug B (P = 0.04) Since 0.001<0.04 the drug A has a higher than the drug B. The only conclusion that can be drawn from such a P value is that you are sure than A>B and less sure than C>B. This does not presume anything about the amplitude of the differences. Significant does not mean important
False causality : lying with statistics There is a strong positive correlation between the number of firefighters present at a fire and the amount of fire damage. Thus, the firefighters present at fire create higher fire damage ! The correlation coefficient is nothing else than a measure of the strength of a linear relationship between 2 variables. Correlation cannot establish causality. A strong correlation between X and Y can occurs when "X" causes "Y" "Y" causes "X" "Z" causes "X" and "Y" (Z = fire size in the previous example) Incidentally with small samples size when X and Y are independent
How to avoid these mistakes ? • Consult your prefered statistician for help in the design of complicated experiments • Use basic descriptive statistics first (graphics, summary statistics,…) • Use common sense • Consider to learn more statistics