110 likes | 137 Views
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. Statistical Methods: Kaplan-Meier Curves Log Rank Test Confidence Intervals Cox Proportional Hazards Model Focus on: hypothesis testing and pvalues
E N D
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. • Statistical Methods: • Kaplan-Meier Curves • Log Rank Test • Confidence Intervals • Cox Proportional Hazards Model • Focus on: • hypothesis testing and pvalues • confidence intervals
Hypothesis Tests • “Although the survival rate at 3 years was somewhat longer for those randomly assigned to the multimodality arm (30%) than to the surgery alone arm (16%), this difference was not significant (p = 0.15).” • “Patients randomized to the multimodality arm have a 31% lower risk of death from any cause over the study (p = 0.09) after adjusting for the other prognostic factors.” • Which (if any) should be concluded? • Surgery and multimodality treatment have same response rate. • Multimodality treatment did not prove to be better and it should be abandoned. • There is a clinically interesting difference here, but the sample size is not sufficient to conclude strong evidence in favor of multimodality treatment.
What is a pvalue? • The pvalue is the probability of observing a result as extreme as you did if the null hypothesis is true. • Null hypothesis?! • “no association” (I.e. between survival and arm) • 3 year survival rate in arm I is the same as in arm II • So, in our case, the pvalue is the probability of observing a 30% 3 year survival in arm II and a 16% 3 year survival in arm I if, in truth, there is no difference in 3 year survival in the two treatment modalities. • Drawback: hypothesis testing conclusion relies heavily on sample size to make conclusion. • When interpreting pvalues, people assume we have a yes/no result and do not always consider what the magnitude of the difference is. • What is the difference between p = 0.04 and p= 0.08? • Not unusual to see • statistically significant, clinically insignificant result • statistically insignificant, clinically significant result
Confidence Intervals Hazard Ratio 95% CI Chemo v surgery 0.69 0.46-1.06 Arm 1 Arm II % 95%CI % 95%CI 1 year survival 58 46-73 72 58-84 3 year survival 16 8-30 30 20-46 What about the confidence interval for the 1 year and 3 year difference?
Convincing, but could be more so • Why not provide confidence intervals for... • median survival • 1 year survival • 3 year survival • Would give readers a “reasonable range” of values to consider for treatment effect that are intuitive. • What is remembered? • P = 0.09 which means insignificant result • But, can anyone remember the treatment effect?
Confidence Intervals for Reporting Results of Clinical Trials, Simon • “[Hypothesis tests] are sometimes overused and their results misinterpreted.” • “Confidence intervals are of more than philosophical interest, because their broader use would help eliminate misinterpretations of published results • “Frequently, a significance level or pvalue is reduced to a “significance test” by saying that if the level is greater than 0.05, then the difference is “not significant” and the null hypothesis is “not rejected….The distinction between statistical significance and clinical significance should not be confused.”
Caveats “They should not be interpreted as reflecting the absence of a clinically important difference in true response probabilities.”
Other Confidence Intervals of Interest • Response Rate • X-year survival rate • Median survival • Hazard ratio • Odds ratios • Means of continuous variables
How “Power” fits in -- should you report power? • “’Power’ represents the the probability of obtaining a statistically significant results if a difference in efficacy of a specified magnitude actually exists.” • “Power takes no account of the actual results obtained.” • Example: • Two treatment groups, 50 patients in each. • Observed response rates of 40% in group A and 50% in group B. • The power to detect a difference of 45% to 65% is only 0.57 • But, 95% confidence interval for true difference is -10% to 30% • CAN conclude: unlikely that you will see an improvement of more than 10% for A versus B. • Power calculation (if published with results) would confuse matters.
Survival Example • Assume that in an oncology randomized clinical trial where outcome is time to death, median survival in the two arms (new treatment and old treatment) is found to be the same: 6 months. 20 deaths per treatment. • Are treatments essentially equivalent, or is trial too small to conclude? • A 90% confidence interval for the ratio of median survival in new versus old is 0.59 to 1.68. • So, we can say that, because median survival in old treatment group is 6 months, the possible improvement is about 4 months in the new treatment (6 x 1.68 = 10.1). • Conclusion: this small trial is rather conclusive that any improvement in survival associated with new treatment is likely to be small.
Conclusions • A difference that is not statistically significant is taken to mean that no real difference exists. • The best way to ensure that a nonsignificant difference will be obtained is to use an inadequate sample size. (and vice versa) • Opportunities for misinterpretation can be reduced if investigators use confidence intervals in reporting results and interpreting their findings.