Statistics in Medical Research RCTs and Cohort

Statistics in Medical ResearchRCTs and Cohort Jemila Hamid Clinical Epidemiology & Biostatistics Pathology & Molecular Medicine McMaster University jhamid@mcmaster.ca July 21, 2011

Outline I. Introduction II. Group comparison • Paper I – Edwards et al., submitted III. Survival Analysis • Paper II – Krag et al, Lancet oncol, 2010 • Paper III – Weaver et al., NEJM, 2011. IV. Design Issues – sample size V. Summary

I. Introduction • Study types are classified into two broad categories • Experimental – researcher investigates the effects of intervention • They are prospective studies and are usually comparative in nature, longitudinal or cross sectional, parallel vs crossover designs – eg. clinical trials – Krag et al., Lancet oncol • Observational – researcher doesn’t influence events • Case-control (retrospective and cross sectional in general) or cohort (prospective and mostly longitudinal), Surveys (cross sectional) – eg. epidemiology, diagnostic testing, public health – Weaver et al., NEJM

Statistical Question? • Estimation? Estimating prevalence of disease, treatment effect, risk, hazard, accuracy etc. • Comparative? Comparing treatment effect with a constant (single sample)? Comparing a new diagnostic test with a gold standard? Comparing two or more treatments? Comparing before and after an intervention? • Association and regression? Relationship between two variables? Effect of one variable on another? Effect of multiple variables on an outcome? • Prediction? Predict (classify into) disease subtype? predict an outcome based on risk factors?

Paper II – Krag et al. • Comparing survival and disease free survival between two surgical procedures: Sentinel-lymph-node resection and axillary-lymph-node dissection • Paper III – Weaver al al. • Comparison of disease recurrence and survival between two groups of patients: those with occult lymph-node metastases and those in whom no occult metastases was detected • In both Papers, estimation and confidence intervals are also a part of the statistical question - eg. estimate overall survival, disease free survival, hazard ratio etc.

Outcome Measures? • Continuous - mean, standard deviation, mean difference • Normal distribution is often assumed • Transformations, non-parametric approaches • Binary – risk, odds, relative risk, risk difference, odds ratio, sensitivity, specificity, classification accuracy, AUC • Logistic regression • Binomial distribution • Count - mean count, proportions, rates • Poisson regression • Negative binomial • Survival – hazard, hazard ratio, time to event • Cox regression • Weibull, lognormal and generalized Gamma distributions

Statistical Analysis Descriptive – Table 1 of medical articles • Summarizing and evaluating data – using graphical, tabular • This can be done using boxplots, histograms, normal probability plots • Gives a good feel of data • Assess distributions – normal? need transformation? • Outliers? What are we going to do about them? • Missing values? Why are they missing? What are we going to do about them?

Paper II

Paper III

Estimation and Confidence Intervals • Estimator – parameter of interest could be mean, response rate, proportion etc • Confidence interval (CI) quantifies imprecision or uncertainty associated with an estimate • the reader can assess whether a result is estimated precisely or not, definitive or not • presenting CI has been widely promoted in the literature

Interpretation of 95% CI • If we repeated the experiment many many times, 95% of the time the TRUE parameter value would be in the interval • Before performing the experiment, the probability that the interval would contain the true parameter value was 0.95

Set of 95% C.I.s from samples of size n=12 drawn from a normal distribution with  = 211 and σ2= 46.

95% CI for continuous outcome

95% CI for proportions

Examples from Papers II and III • Paper II • 8 year overall survival were 91.8% (95% CI: 90.4-93.4) for group 1 and 90.3% (95% CI:88.8-91.8) in group 2 • HR 1.05 (95% CI: 0.90-1.22) • Paper III • Occult metastases were detected in 15.9% of the patients (95%CI: 14.7-17.1) • Adjusted hazard ratio HR: 1.40 (95% CI: 1.05-1.86)

Hypothesis Testing

Examples from papers I-III • Papers I – III, overall and disease free survival were compared among the groups considered eg paper II, HR=1.2,p value = 0.12; paper III, HR=1.40, p-value=0.03 • Paper I • Statistical Question - Comparison of clinical characteristics of B-Cell lymphoma unclassifiable (BCLU) with that of Burkitt Lymphoma (BL) and Diffuse Large B-Cell Lymphoma (DLBL) • Several clinical variables were considered – some binary and some categorical eg. Researchers compared DLBL and BCLU with respect to: Gender (P-value=0.51), CNS involved (p-value=0.01)

Analysis of variance (ANOVA)

ANOVA table

Example for ANOVA • In paper I, if the researches were to compare all the three tumor types: DLBL, DCLU and BL with respect to the a continuous clinical characteristics, Aanova would be appropriate • For the other “non-continous” characteristics, one can apply an appropriate transformation before applying anova • If they don’t use anova and opt to use pairwise comparisons, there will be an issue of multiple comparison

Example Anova … • E.g. (Altman, 1991) Twenty two patients undergoing cardiac bypass surgery were randomized to one of three ventilation groups • 50% nitrous oxide and 50% oxygen mixture for 24 hours • Same as 1, but received received treatment during the operation • No nitrous oxide but received 35-50% oxygen for 24 hours Compare if three groups have the same red cell folate(RCL) levels

ANOVA table At the 5% level, there is evidence to suggest there is a significant difference in RCL levels among the three groups.

Regression Methods • So far – we considered estimation, confidence intervals and comparisons. Some of these can be framed as a simple regression model • Anova can, for example, be framed as a regression model where the treatment groups are independent variables • Estimation is a big part of regression methods • But, will present regression models in general and talk about special cases eg. Cox regression

Regression Methods … • A method for analyzing relationship between two or more variables • There is a causal direction – investigator seeks to ascertain the causal effect of one variable upon another • Otherwise, it will be association or correlation analysis – no causal relationship, here one needs to measure the strength of association between variables without assuming any causal relationship

Two kinds of variables – outcome (dependent variable) and predictor (independent variable) • Predictors are sometimes called – risk factors, exposure variables, prognostic factors depending on the nature of data • Simple linear regression: Y= α + β1X1 + β2X2+…+βpXp • Depending on the distribution of the outcome variable, we have different types of regression: Anova (MD), logistic regression (RR and OR), Cox regression (HR)

Example • E.g. (Altman, 1991) Twenty two patients undergoing cardiac bypass surgery were randomized to one of three ventilation groups • 50% nitrous oxide and 50% oxygen mixture for 24 hours • Same as 1, but received received treatment during the operation • No nitrous oxide but received 35-50% oxygen for 24 hours Compare if three groups have the same red cell folate(RCL) levels

Examples … • Study of biomarkers – Kazu et al., work in progress • Evaluate the diagnostic ability of a panel of five immunohistochemical markers in distinguishing between Endometrioid Adenocarcinoma (EC) and Serous Carcinoma (SC) • Study the relative contribution of each of the five markers towards predicting the two histologic types. • Clinical covariates such as age, body mass index (BMI), stage, and history of hormone replacement therapy • Multiple logistic regression and ROC analysis was performed to estimate odds ratio and construct a predictive model

Examples… • Paper II: • Cox regression is used to estimate hazard ratio, model and compare survival between the two surgical procedures • Here outcome (dependent variable) is survival and disease free survival, predictor variable is surgical groups, other covariates are also included in the model to estimate adjusted HR • Paper III: • Again, Cox regression is used to model and compare survival between the two group of patients • Outcome variables overall survival, disease free survival and distant-disease free interval. Predictor variable is two groups of patients, other covariates are also included here

Other Methods • Diagnostic testing • Agreement studies • Multivariate methods: cluster analysis, discriminant analysis, factor analysis, PCA, CCA • Meta analysis – combining data from different studies • Methods for correlated data - longitudinal and repeated measures data • Methods for high-dimensional data – genomics and genetics

II. Group Comparison • We will talk about Paper I • We will focus only on the comparative aspect of the paper • Talk about group comparison, multiple comparison using same data

Paper I • Statistical Question - Comparison of clinical characteristics of B-Cell lymphoma unclassifiable (BCLU) and Diffuse Large B-Cell Lymphoma (DLBL) • Several clinical variables were considered – some binary and some categorical eg. Researchers compared DLBL and BCLU with respect to the clinical variables • Survival is also considered in this paper – but we will focus on the comparative aspect of the paper here

Materials and methods Paper I • A ten-year retrospective examination of the clinical characteristics, survival, treatment response and molecular profile of BCLU (n=34) compared to DLBL (n=97) • Variables considered include: Gender, age at diagnosis, International Prognostic Index (IPI), Eastern Cooperative Oncology Group (EGOC) performance status, Ann Arbour stage, presence of B-symptoms, bone marrow (BM) and central nervous system (CNS) involvement, extranodal and bulky disease • Chi-square and one way Analysis of Variance were used respectively for categorical and continuous data to compare the baseline characteristics between groups

Results paper I Table 2. Clinical characteristics at diagnosis and treatment regimes.

II. Survival Analysis • Focus on statistical methods used Papers II and III • We will discuss • Study type and design • Materials and Methods • Statistical Analysis • Results

Paper II

Paper III

Study type - Paper II • Randomized controlled phase 3 trial done at 80 centers across Canada and the USA • Women with invasive breast cancer were randomly assigned to two surgical procedures: • Sentinel-lymph-node resection (SLN) plus axillary-lymph-node dissection (ALND) • SLN alone with ALND only if the SLNs were positive • Randomization was stratified by age (≤ 49, ≥ 50), tumor size and surgical plan (lumpectomy, mastectomy) • Primary outcome was overall survival – but other secondary outcomes were considered

Study type - Paper III • Retrospective and observational study – from previously conducted RCT • Paraffin-embedded tissue blocks of sentinel lymph nodes obtained from patients with pathologically negative SLNs were centrally evaluated for occult metastases • Objective is: estimate proportion of patients with occult metastases and compare survival between group of patients with and without occult metasetases

Methods, survival analysis • In both papers, the primary outcome is overall survival – secondary outcome disease free survival, regional control • Both papers used the log rank test and Cox proportional hazard models • Kaplan-Meier corves were used in both papers • In both papers, HRs and adjusted HRs with 95% CIs were provided

Survival analysis … • Survival analysis is used to analyze time to event data – arises in both clinical and cohort studies • Event: Death, disease occurrence, disease recurrence, recovery, or other experience of interest • Time: The time from the beginning of an observation period (e.g., surgery) to (a) an event, or (b) end of the study, or (c) loss of contact or withdrawal from the study • We almost never observe the event of interest in all subjects – for these patients, we don’t know their survival time

Survival analysis … Censoring/censored observation • When a subject does not have an event during the observation time, they are described as censored, meaning that we cannot observe what has happened to them subsequently. • A censored subject may or may not have an event after the end of observation time • Such survival times are called censored.

Survival analysis …

Survival analysis…. • Median survival- time point at which 50% of the population survives • Mean survival – the average survival time (not commonly used) – calculated as the number if years survived by all patients divide by the number of deaths • 5 year survival – proportion of patients that survive 5 years ( it can be 1 year, 2 years, 5 years or 10 years depending on the nature of the event) Time to event

Paper II

Paper III

Statistics in Medical Research RCTs and Cohort

Statistics in Medical Research RCTs and Cohort

Presentation Transcript

The suitability of using RCTs in educational research

Research and Statistics in Psychology

Statistics and Research methods

Political Research and Statistics

Medical Statistics

Employment RCTs in France

Avoiding bias in RCTs

Medical Statistics

Statistics and Research

Statistics in Medical Research

Statistics and Research methods

Statistics in Educational Research

Traps and pitfalls in medical statistics

Statistics and Operational Research

Baseline Measurements in RCTs

Statistics you can use: Practical use of statistics in reading medical research literature

COMPARATIVE STATISTICS AND RESEARCH

Applied Biostatistics_ Statistics for Medical Research - Edukite

Medical Statistics & Research Course - BeMRCOG Ltd

Using Statistics in Research