Generalized pairwise comparisons of prioritized outcomes in the two-sample problem

Marc Buyse, ScD IDDI, Louvain-la-Neuve, and I-BioStat, Hasselt University, Belgium marc.buyse@iddi.com Generalized pairwise comparisons of prioritized outcomes in the two-sample problem

Outline • Key problems in clinicaldevelopment • An example in cancer • A bit of theory • Back to the example • Anotherexample in ophthalmology • Conclusions

KEY PROBLEMS IN CLINICAL DEVELOPMENT

Development costs are too high…

Development times are too long… Source: Steven Hirschfeld (FDA) Ref: Steven Hirschfeld, FDA (personal communication)

Too few new drugs are approved… Ref: Arthur D. Little’s views on key Pharma trends, March 31, 2010

AN EXAMPLE IN CANCER

Advanced colorectal cancer 420 subjects with previously untreated metastatic colorectal cancer R 210 210 LV5FU2 + oxaliplatin LV5FU2 new combination of 5-fluorouracil, leucovorin and oxaliplatin standard regimen of 5-fluorouracil and leucovorin until disease progression, intolerance to treatment, or death

Progression-free survival HR = 0.66, P = 0.0003

Survival HR = 0.83, P = 0.12

Oxaliplatin approved for metastatic colorectal cancer • In France (AFSSAPS) in 1996 • In Europe (EMEA) in 1999 • In the US (FDA) in 2002

Problems? • The twoendpoints (OS and PFS) are analyzedseparately. One endpointsuggests (PFS) statisticallysignificantbenefit, the other (OS) does not. On balance, do we claim treatment to bebetter?

Problems? • The twoendpoints (OS and PFS) are analyzedseparately. One endpointsuggests (PFS) statisticallysignificantbenefit, the other (OS) does not. On balance, do we claim treatment to bebetter? • Neitherendpointisperfect: • PFS is not confounded by othertreatments, islessaffected by unrelated causes of death, and has more events • OS isclinicallymost relevant and ismeasuredwithoutbias or error

Problems? • The twoendpoints (OS and PFS) are analyzedseparately. One endpointsuggests (PFS) statisticallysignificantbenefit, the other (OS) does not. On balance, do we claim treatment to bebetter? • Neitherendpointisperfect: • PFS is not confounded by othertreatments, islessaffected by unrelated causes of death, and has more events • OS isclinicallymost relevant and ismeasuredwithoutbias or error • The PFS ignores the time between progression and death. The time to first event ignores subsequentevents. Thus, LV5FU2 + oxaliplatin might prolong the PFS of some patients, but shorten their remaining survival afterwards.

Problems? • The twoendpoints (OS and PFS) are analyzedseparately. One endpointsuggests (PFS) statisticallysignificantbenefit, the other (OS) does not. On balance, do we claim treatment to bebetter? • Neitherendpointisperfect: • PFS is not confounded by othertreatments, islessaffected by unrelated causes of death, and has more events • OS isclinicallymost relevant and ismeasuredwithoutbias or error • The PFS ignores the time between progression and death. The time to first event ignores subsequentevents. Thus, LV5FU2 + oxaliplatin might prolong the PFS of some patients, but shorten their remaining survival afterwards. • Traditionalmethods of analysiscannotdifferentiatebetween a modestbenefit in all patients and a large benefit in some patients.

A BIT OF THEORY

General Setup Eligible subjects R Treatment (T ) Control (C ) Let Yjbe the outcome of j thsubject in C (j = 1, … , m ) Let Xibe the outcome of i thsubject in T (i = 1, … , n )

Recall the Wilcoxon test Xi and Yj are realizations of a continuous or an ordereddiscrete variable. Let S1 , S2 , … , Sn be the orderedranks of the outcomesobserved in T. Wilcoxon (1945) proposed the test statistic with expectation and variance

The Mann-Whitney form of the Wilcoxon test The Wilcoxon test statisticcanbederivedfromconsideration of all possible pairs of subjects, one fromeachtreatment group. Let The Wilcoxon-Mann-Whitney test statisticWcanbewritten as

Gehan generalized the Wilcoxon test Gehan (1965) generalized the Wilcoxon test to the case of censoredoutcomes. Letting and denotecensored observations, the pairwisecomparisonindicatorisnow

First, generalize the test further for a single outcome measure Now let Xi and Yjbeobservedoutcomes for anyoutcomemeasure (continuous, time to event, binary, categorical, …) All werequireisthat the pairwisecomparison of observedoutcomesXiand Yjbe able to classify the pair as favoringT , C , or neither (if outcomesXiand Yjare tied or if eitheroutcomeismissing). pairwise comparison Xi Yj favorsC (unfavorable) favorsT (favorable) uninformative neutral

Continuous outcome measure

Time to event outcome measure

Binary outcome measure

Generalized pairwise comparisons Let Xi and Yjbevectors of observedoutcomes for anynumber of occasions of a single outcomemeasure, or anynumber of outcomemeasuresthatcanbeprioritized. All werequireisthat the pairwisecomparison of prioritizedoutcomesXiand Yjbe able to classify the pair as favorable, unfavorable, or neither.

Next, generalize the test to prioritized repeated observations of a single outcome measure…

Last, generalize the test to severalprioritized outcome measures…

A general measure of treatment effect Extend the previousdefinition of Uij Uis the differencebetween the proportion of favorable pairs and the proportion of unfavorable pairs. We call thisgeneralmeasure of treatmenteffect the « proportion in favor of treatment » ().

The proportion in favor of treatment () is a linear transformation of the probabilistic index, P (X > Y ):

The proportion in favor treatment () For a binary variable,  isequal to the difference in proportions For a continuous variable ,  isrelated to the effect size d For a time-to-event variable,  isrelated to the hazard ratio  and the proportion of informative pairs f

A re-randomization test for  The test statisticU (or ) no longer has known expectation and variance. An empirical distribution of  canbeobtainedthroughre-randomization. Tests of significance and confidence intervalsfollow suit.

BACK TO THE EXAMPLE

Prioritized outcomes for patients with metastatic colorectal cancer

Prioritized outcomes for patients with early HER2neu overexpressing breast cancer

Progression-free survival GENERALIZED PAIRWISE COMPARISONS (44,100 pairs)

Overall survival GENERALIZED PAIRWISE COMPARISONS (44,100 pairs)

Magnitude of benefits

Prioritized outcomes GENERALIZED PAIRWISE COMPARISONS (44,100 pairs)

ANOTHER EXAMPLE

Age-related Macular Degeneration 592 subjects with neovascular age-related macular degeneration R 296 296 Pegaptanib Sham Intravitreous injections of 3 mg of pegaptanib (an anti–vascular endothelial growth factor) Sham injections (with a syringe applied on the surface of the eye to simulate the pressure of an injection) every 6 weeks over a period of 54 weeks

Endpoints N C K Z O R H S D K D O V H R C Z R H S O N H R C Measurement of visual acuity (number of letters of standardized chart correctly read) every 6 weeks

Mean visual acuity over time

Endpoints N C K Z O R H S D K D O V H R C Z R H S O N H R C “clinically relevant loss”: 15 letters  3 lines Primary endpoint: loss of < 15 letters of visual acuity at one year (prevention of major vision loss)

The whole data, and nothing but the data: measurements of visual acuity

Measurements of visual acuity with last observation carried forward

Measurements of visual acuityat week 0 and week 54

Measurements of visual acuitychanges from week 0 to week 54

Loss < 15 letters in visual acuity between weeks 0 and 54

Generalized pairwise comparisons of prioritized outcomes in the two-sample problem