Seminar, Bordeaux School of Public Health 8 June 2011

Medical and Pharmaceutical Statistics Research Unit Seminar, Bordeaux School of Public Health 8 June 2011 Combining endpoints in clinical trials to increase power John Whitehead Medical and Pharmaceutical Statistics Research Unit Department of Mathematics and Statistics Tel: +44 1524 592350 Fylde College Fax: +44 1524 592681 Lancaster University E-mail: j.whitehead@lancaster.ac.uk Lancaster LA1 4YF, UK

1. Ordinal endpoints in stroke studies • Treatments for acute stroke are administered for a few days following diagnosis • The primary endpoint is the functional status of the patient, 90 days after the stroke • Several scoring systems exist, including the Barthel index, the modified Rankin score and the NIH stroke scale • All are ordinal scales from full recovery to vegetative state, to which death before 90 days can be added

Analysis of an ordinal response R1 = best response (full recovery) Rk = worst response (death before day 90)

Let Ch = c1 +…+ ch Ch = the number of controls with response Rhor better Let Ch = ch +…+ ck Ch = the number of controls with response Rhor worse Similarly define Eh, Eh, Th and Th

Let QCh = P(a control has response Rh or better) QEh = P(an experimental has response Rh or better) (thenQCk = QEk = 1) Put qh is the log-odds ratio for response Rh or better, E:C h = 1,…, k – 1

The proportional odds assumptionis 1 = 2 = … = k–1 =  The common value, , is a measure of the advantage of the experimental treatment > 0 experimental better  = 0 no difference < 0 control better

Under PO, the most efficient test of treatment advantage -greatest power for any given sample size is based on the test statistics and For large samples and small , approximately Z ~ N(V, V) Z is the score statistic and V is Fisher’s information

To test for treatment difference, refer Z2/V to  This is the Mann-Whitney test  Also known as the Wilcoxon test Under the null hypothesis of no treatment effect, PO is true with q = 0 Thus the hypothesis test and the p-value are valid without assumptions Estimates of and confidence intervals for q do rely on assumptions, as does adjustment for prognostic factors

How should investigators choose which scale to use? An alternative to choosing is to combine more than one stroke scale in the analysis Tilley et al. (1996) combined four scales in the trial of rTPA as a treatment in acute stroke conducted by the National Institute of Neurological Disorders and Stroke -the trial was positive and the approach caught on If the treatment has a beneficial effect on all scales, then combining them will increase the power to demonstrate the advantage of the treatment

2. Example: The ICTUS trial in stroke • Currently ongoing in 60 centres in Europe • Patients who have suffered acute stroke • Randomised between citicoline and placebo • Assessed at 90 days on Barthel index, modified Rankin score and NIH stroke scale • Prognostic factors - baseline NIHSS - time from stroke to treatment ( or > 12 hours) - age ( or > 70 years) - site of stoke (right or left side) - use of rTPA (yes or no)

The approach used by Tilley et al. Combine the three analyses using GEE (based on an independence covariance structure: IEE) That is, analyse as if the three scores were independent, but adjust the standard error of the treatment effect estimate using the sandwich estimator • complicated to understand • no associated sample size formula • failed in test data set of 1000 patients with binary responses and adjustment for 60 centres

An alternative general approach The log-odds ratio q and the test statistics Z and V, for the analysis of the ith response will be denoted by qi, Zi and Vi i = 1 is Barthel index i = 2 is modified Rankin score i = 3 is NIH stroke score W will test H0: q1 = q2 = q3 = 0 (no effect of treatment on any of the scales) using Z = Z1 + Z2 + Z3

For each scale, Zi ~ N(qiVi, Vi) if Vi is large and qi is small If q1 = q2 = q3 = q, then approximately where V = V1 + V2 + V3, C = 2(C12 + C23 + C31) and Cij = cov(Zi, Zj)

It follows that, if then as required for a c2 test and for sample size calculation What we need to use this is an expression for Cij = cov(Zi, Zj)

The binary case, no covariates - only one response

The binary case, no covariates - ith of several responses - assuming that each patient provides all responses

Covariance between Zi and Zj For two such statistics, we have where ti1 is the number of patients succeeding on the ith scale, tj1 the number succeeding on the jth scale and t(ij),1 the number succeeding on both scales (Pocock, Geller and Tsiatis, 1987)

The ordinal case, no covariates - ith of several responses with Cih = ci1 +…+ cih and Cih = cih +…+ cik

Covariance between Zi and Zj For two such statistics, we have where dfv = -1, 0 or 1 if f <, =, > v respectively, Kfg = tfi tgj/n2, Hfg = t(ij),(fg)/n - Kfg, t(ij),(fg) is the count of patients who have both response Rf,i on the ith scale and response Rg,j on the jth scale

Adjustment for covariates The approach can be extended to allow for prognostic factors via stratification and/or linear modelling of covariates Stratification: sum Z and V statistics over strata, and assume that the treatment effect is constant over strata Covariate adjustment: use proportional hazards regression, plus binary logistic regression to model the simultaneous occurrence of particular responses on different scales (such as complete recovery on Barthel index and partial recovery on the modified Rankin)

3. Sample size calculation for the combined test For power of 90% to detect a log-odds ratio of qR as significant at level 0.05 (two-sided), we need for a test based on a single response, and for a test based on the combined approach

For a single binary (success/fail) response, with an overall success probability of p, For three binary responses, each having an overall success probability of p, and with the probability of success on any two responses being g

Suppose that g = p2 (independence), then -that is one third of the sample size using only one response For g = p (responses coincide), then -that is the same as the sample size using only one response Otherwise, combining the responses reduces sample size by up to one third, depending on the correlation between the responses

Now suppose that p = 0.2 and that g = 0.1 (correlation = 0.75) then for one response and for three responses 58% of the sample size using a single response

If the success rate on control is 18%, and the trial is to be powered to detect an improvement to 22%, then the log-odds ratio is so that, for one response n = 4200 and for three responses n = 2450

ICTUS trial Fixed sample size using only Barthel: 2590 modified Rankin: 3584 NIH stroke scale: 5494 Combined test: 2421 This is for dichotomised responses, based on the previous data available ICTUS is using a sequential design

Ordinal scales For sample size calculation for combining several ordinal responses, probabilities of every pair of responses on every pair of responses must be anticipated - Databases from previous trials can be used - A mid-trial sample size review can be used

Evaluation of the combined approach The first of a series of interim analyses of the ICTUS trial takes place when data from 1000 patients are available A dataset from four previous studies comparing citicoline with placebo is available (Davalos et al., 2002) comprising 1,372 patients First, one dataset of 1,000 was extracted and analysed using the combined test and the GEE approach Then 10,000 datasets of size 200, 500 or 1,000 were randomly selected, the treatment code was removed and randomly reassigned - in some runs an artificial treatment effect of known magnitude was introduced

Analyses of a synthetic stroke dataset, n = 1000

Results from 10,000-fold simulations of the combined score test and the GEE approach

Conclusions Use of the combined approach can reduce sample size, provided that the treatment effect is apparent on all responses being combined The score approach used here matches the GEE approach, and is more reliable in small samples The approach can combine quantitative responses and survival responses, it can also be used to combine different types of response

References Bolland, K., Whitehead, J., Cobo, E. and Secades, J. J. (2009). Evaluation of a sequential global test of improved recovery following stroke as applied to the ICTUS trial of citicoline. Pharmaceutical Statistics8, 136-149. Dávalos A, Castillo J, Álvarez-Sabin J, Secades JJ, Mercadal J, López S, Cobo E, Warach S, Sherman D, Clark WM, Lozano R. (2002). Oral citicoline in acute ischemic stroke. Stroke33, 2850-2857. Dávalos A. (2007). Protocol 06PRT/3005: ICTUS study: International Citicoline Trial on acUte Stroke (NCT00331890) Oral citicoline in acute ischemic stroke. Lancet Protocol Reviews. Pocock, S.J., Geller, N. L. and Tsiatis, A. A. (1987). The analysis of multiple endpoints in clinical trials. Biometrics43, 487-498. Tilley, P. C., Marler, J., Geller, N. L., Lu, M., Legler, J., Brott, T., Lyden, P. and Grotta, J. for the National Institute of Neurological Disorders and Stroke (NINDS) rt-PA Stroke Trial Study Group. (1996). Use of a global test for multiple outcomes in stroke trials with application to the National Institute of Neurological Disorders and t-PA Stroke Trial. Stroke27, 2136-2142. Whitehead, J., Branson, M. and Todd, S. (2010). A combined score test for binary and ordinal endpoints from clinical trials. Statistics in Medicine 29, 521-532.

Seminar, Bordeaux School of Public Health 8 June 2011