1 / 30

An editorial of Nature Medicine  (2005) <Statistically significant> :

Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10. Challenge to Nature Medicine. An editorial of Nature Medicine  (2005) <Statistically significant> :

anne-sharpe
Download Presentation

An editorial of Nature Medicine  (2005) <Statistically significant> :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Guideline ofNature Ji-Qian FangSchool of Public HealthSun Yat-Sen University2008.10

  2. Challenge to Nature Medicine • An editorial of Nature Medicine  (2005) <Statistically significant> : “Some of the articles published in Nature and Nature Medicine were criticized due to the deficiency in statistical issues”.

  3. What happened? • Emili García-Berthou and Carles Alcaraz (Girona Univ., Spain) published an article in BMC Medical Research Methodology (May 2004). They reviewed 181 research papers of Nature (2001) , found that 38% of them have at least one mistake in statistics. • Since then, a series of critical articles have been published, of which one written by Robert Matthews (The Financial Times) analyzed the statistical methodology of the articles in Nature Medicine (2000).They found that 31% of the authors had misunderstood the meaning of P-value, even some one reported the P-valuewith unnecessary precision ( 0.002387).

  4. Independent statistical “audit” • Nature Medicine invited two experts from the University of Columbia to work out “statistical audit” , especially to evaluate 21 articles published in 2003 with a list of consolidated criteria on statistics. • They found that some papers almost did not have any quantitative analysis, and some contained very complicated statistical and mathematical issues. While most of them have just used a litter statistical testing, but with incomplete descriptions such that one could hardly assess whether they were appropriate or not.

  5. Checklist of statistical adequacy

  6. Reported n at start of study and for each analysis • Provided sample size calculation or justification • Examples • We believed that . . . the incidence of symptomatic deep venous thrombosis or pulmonary embolism or death would be 4% in the placebo group and 1.5% in the ardeparin sodium group. Based on 0.9 power to detect a significant difference (P=0.05, two-sided), 976 patients were required for each study group.To compensate for non-evaluable patients, we planned to enroll 1000 patients per group

  7. To have an 85% chance of detecting as significant (at the two sided 5% level) a five point difference between the two groups in the mean SF- 36 general health perception scores, with an assumed standard deviation of 20 and a loss to follow up of 20%, 360 women (720 in total) in each group were required.

  8. 3. Identified all statistical methods unambiguously 4. If statistical methods were described adequately, were any of them clearly inappropriate? Example All data analysis was carried out according to a pre­established analysis plan. Proportions were compared by tests with continuity correction or Fisher’s exact test when appropriate. Mean serum retinol concentrations were compared by t test. . . Two sided significance tests were used throughout.

  9. Multivariate analyses were conducted with logistic regression. The durations of episodes and signs of disease were compared by using proportional hazards regression.

  10. Methods for additional analyses, such as subgroup analyses and adjusted analyses: Example Proportions of patients responding were compared between treatment groups with the Mantel-Haenszel chi­squared test, adjusted for the stratification variable, methotrexate (氨甲叶酸) use. • . . . it was planned to assess the relative benefit of CHART in an exploratory manner in subgroups: age, sex, performance status, stage, site, and histology. To test for differences in the effect of CHART, a chi­squared test for interaction was performed, or when appropriate a chi-squared test for trend (131).

  11. 5. Provided alpha for all statistical tests 6. Specified whether tests were one-sided or two-sided 7. Stated whether the data met the assumptions of the test 8. Reported actual P values for primary analyses

  12. Example The data of two samples were adequately normally distributed(Shapiro-Wilk test:P1=0.466;P2= 0.482) and the two population variances were equal at thesignificant level 0.10(F=1.345;P=0.261), so two independent samples t test was used(t=4.137; df=18;P=0.001). The results indicated a statistically significant difference between effects of two drugs attwo-tailedsignificant level 0.05 and the average increase of concentration of Hb was higher in patients taking the new drug, which could also be observed from the 95% confidence interval of the difference of two population means (3.829, 11.731).

  13. 9. Were the statistical measures (mean, standard error, standard deviation, etc.) reported, and were they clearly labeled? Example The results show that the mean ± SD of IL-2 for the experimental group (n=31) was 16.00IU/ml± 7.50 IU/ml and for the control group (n=30) was 20.00IU/ml±8.00 IU/ml; the difference between the two group means was 4.00IU/ml, and the 95% CI of the difference was(0.0304, 7.9696)(IU/mL)

  14. 10. Was the unit of analysis clearly stated in all comparisons? 11. Are mean and standard deviation used to describe data sets that may be non-normally distributed or when the sample size is very small? Results of Blood Gas Analysis(血气分析) What are the problems?

  15. 12. Explanation of unusual or complex statistical methods Example In order to compare the effects of common feed, feed with plasma protein and feed with bioprotein on weight growing to weaning young pigs,30 weaning young pigs were matched to 10 blocks by gender, days of age and baseline weight. Then 3 individuals in each block were randomly assigned to 1 of 3 treatment groups. After 10 days, the changes in weights from baseline were measured. ---- Random block design

  16. The mean change of weight SD was 3.33kg0.48kg for the group of common feed, 3.83kg 0.61kg for that of plasma protein, and 4.10kg 0.68kg for that of bioprotein. Results of two-way ANOVA under the significance level of 0.05 indicated statistically significant differences among 3 treatment groups (F=6.8112, P=0.0063). Similar results were found among 10 blocks (F=2.7407, P=0.0328). ---- Results of ANOVA

  17. 13. Explanation of data exclusions, if any Example • The primary analysis was intention-to-treat and involved all patients who were randomly assigned • One patient in the alendronate group was lost to follow up; thus data from 31 patients were available for the intention-to-treat analysis. Five patients were considered protocol violators . . . Consequently, 26 patients remained for the per- protocol analyses

  18. Protocol deviations • Authors should report all departures from the protocol, includingunplanned changes to interventions, examinations, data collection, and methods of analysis. • The nature of the protocol deviation and the exact reason for excluding participants after randomization should always be reported.

  19. 14. Explained reasons for any discrepancy between initial n and n for each analysis Example Initially, the 60 rats were randomly divided into 3 groups, 15 for each, to receive 3 levels of doses respectively. However, at the end of the first week, 2 rats in the group of low dose escaped; on the 40-th day, 1 rat in the group of high dose and 1 in the control group escaped …

  20. 15. Explained method of treatment assignment (randomization, if any) Example Determination of whether a patient would be treated by Streptomycin(链霉素)and bed-rest (S case) or by bed-rest alone (C case)was made by reference to a statistical series based on random sampling numbers drawn up for each sex at each centre by Prof. Bradford Hill; the details of the series were unknown to any of the investigators or to the coordinator and were contained in a set of sealed envelopes, each bearing on the outside only the name of the hospitaland a number. After acceptance of a patient by the panel,the envelope was opened at the central office; the card inside told the medical officer of the centre if the patient was to be an S or a C case.

  21. Example 18 patients with acute encephalitis B (乙型脑炎) in a clinic were randomly allocated into 3 groups. Each group accepted different kind of treatments, say treatment A, B and C; and the fevering days were measured as the effects of treatments. 16. Explained any data transformation

  22. Consider the two assumptions of one-way ANOVA. The fevering days are positively skew from the normal distribution; and the ratio of is closed to 10, the assumption of homogeneity of variances is also abandoned. Therefore, a square root transformation of the scale for the fevering days is applied… • The new scales have been used in computation of one-way ANOVA. It resulted in that there is no significant difference on the average fevering days (scales of square roots) among the three kinds of treatments.

  23. 17. Discussed adjustments for multiple testing Example Multiple comparison with Bonferroni adjustment (alpha level of 0.0167) revealed that the effects of the two treatments with protein were significantly higher than that of common feed, while the difference between the two treatments with protein was not statistically significant. ----Multiple comparison

  24. Number of hospitals Number of hospitals 北京 天津 河北 山西 内蒙 北京 天津 河北 山西 内蒙 For graphs 18. Were effect sizes distorted? (by truncation of y axis, etc.) What are the problem?

  25. 19. Were error bars unlabeled? 20. Were error bars absent? Cholesterol (mg /d L) • What is the height for? • What are the bars for? • What are the stars for? Normal Patient Cholesterol (mg /d L) Normal Patient

  26. SummaryThree errors are particularly common • Multiple comparisons: When making multiple statistical comparisons on a single data set, authors should explain how they adjusted the alpha levelto avoid an inflated Type I error rate, or they should select statistical tests appropriate for multiple groups (such as ANOVA rather than a series of t-tests).

  27. Normal distribution: Many statistical tests require that the data be approximately normally distributed; when using these tests, authors should explain how they tested their data for normality. If the data do not meetthe assumptions of the test, then a non-parametric alternative should be used instead.

  28. Small sample size: When the sample size is small (less than about 10), authors should use tests appropriate to small samples or justify their use of large-sample tests.

  29. Thanks

More Related