1 / 26

On Some Statistical Considerations in Testing for Multiple Endpoints in Clinical Trials

On Some Statistical Considerations in Testing for Multiple Endpoints in Clinical Trials. Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA ASA Biopharm Section FDA/Industry Workshop, September 21-23, 2004, Washington, D.C. Disclaimer.

naeva
Download Presentation

On Some Statistical Considerations in Testing for Multiple Endpoints in Clinical Trials

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Some Statistical Considerations in Testing for Multiple Endpoints in Clinical Trials Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA ASA Biopharm Section FDA/Industry Workshop, September 21-23, 2004, Washington, D.C.

  2. Disclaimer • The views in this presentation do not necessarily reflect those of the Food and Drug Administration

  3. Outline • Concepts - nature of relationship between endpoints • Issue #1:Multiple primary endpoints are often highly correlated. How to take advantage of this in adjusting for multiplicity? • Issue #2:Use of sequential analysis of endpoints is increasingly becoming popular. How to reconcile some of the difficulties it poses? • Issue #3:Problem of statistical testing when more than 1 primary endpoint must show statistical significance for effectiveness results to be clinically persuasive (To be presented at the PhRMA Meeting, October 2004, Washington, D.C. )

  4. Triaging of multiple endpoints into meaningful families by trial objectives • Hierarchical ordered families 1) Prospectively defined 2) FWTE rate controlled Primary endpoints Secondary endpoints Exploratory endpoints (often not prospectively defined) • Primary endpoints are primary focus of the trial. Their results determine • main benefits of he clinical trial’s intervention. • Secondary endpoints by themselves generally not sufficient for characterizing • treatment benefit. Generally, tested for statistical significance for extended • indication and labeling after the primary objectives of the trial are met.

  5. Nature of relationships between endpoints • Statistical independence and dependence concepts (familiar to statisticians) • Causal dependence between endpoints (related to treatment effect) Endpoint X has effect  the endpoint Y will also have an effect, vice versa Examples: Diabetes trials - HbAc1 and fasting glucose levels. CHF trials – CHF related deaths and all-cause mortality. ITT versus PP endpoints • Correlation between endpoints do not necessarily imply this causal dependence (A surrogate endpoint and a clinical endpoint may be correlated w/o this property).

  6. Extent of multiplicity adjustments between endpoints correlation high Practically no adjustments Small adjustments Good case for combining endpoints Large adjustments low high low Causal dependence (Homogeneity of treatment effects across endpoints)

  7. Issue #1: • Multiple primary endpoints are often highly correlated. How to take advantage of this in adjusting for multiplicity?

  8. Adjusting for multiplicity for moderate to high correlated endpoints? • For K =2, 3: fairly easy to handle. Examples: • Sidak type adjustments (K=2, 3) • Hochberg’s method (K =2) with correction for correlation • Closed testing using Simes test (K=2, 3) with correction for correlation • For K > 3: Ad hoc procedures • Tukey-Ciminera-Heyse’s method (1985) • Modifications of Dubey’s method (1985) [Armitage-Parmar, 1985-86] • Other methods: Bootstrap methods (Westfall, 1992) O’Brien’s OLS/GLS tests (1984)

  9. 2 Endpoint Case: Sidak type adjustments Assumption: test statistics Z1 and Z2 follow bi-variate normal distribution Overall α = 0.025, 1-sided tests Corr.(1)Adj  2*(Adj ) 1 Adj 2 0.0 0.01258 0.01252 0.02 0.00510 0.3 0.01292 0.02584 0.00559 0.5 0.01348 0.02696 0.00649 0.7 0.01463 0.02926 0.00857 0.8 0.01568 0.03136 0.01068 0.9 0.01751 0.03502 0.01464 (1) Equal adjustments for both endpoints

  10. 2 Endpoint Case: Adjustment in the Hochberg method Test statistics Z1 and Z2 follow bi-variate normal distribution Overall αlpha = 0.05, 2-sided tests r Type I Adjustment Type I Test the smaller P Error rate Factor C Error Rate at level 0.0 0.05 1 0.05 0.0250 0.3 0.04934 1.014447 0.05 0.0254 0.5 0.04802 1.047418 0.05 0.0262 0.7 0.04560 1.122461 0.05 0.0281 0.8 0.04382 1.197015 0.05 0.0299 0.9 0.04168 1.335077 0.05 0.0334 0.95 0.04096 1.470331 0.05 0.0368 If max (p1, p2) < 0.05, then both endpoints significant If max (p1, p2) < 0.05, then test the smaller p-value at level C/2 (0.05)

  11. 3 Endpoint Case: Sidak type adjustments Test statistics Z1, Z2 and Z3 follow 3-variable normal distribution Overall αlpha = 0.025, 1-sided tests r12 r13 r23 (1)Adj  2*(Adj ) 1 (2)Adj 2 0 0 0 0.00840 0.01680 0.02 0.00255 .3 .3 .3 0.00877 0.01754 0.00287 .5 .3 .3 0.00898 0.01796 - .5 .5 .3 0.00920 0.01840 0.00343 .5 .5 .5 0.00941 0.01882 0.00350 .8 .3 .3 0.00984 0.01968 0.00416 .8 .5 .5 0.01029 0.02058 0.00467 .8 .8 .3 0.01120 0.02240 0.00647 .8 .8 .5 0.01127 0.02254 0.00648 .8 .8 .8 0.01209 0.02418 0.00689 (1) Equal adjustments for all 3 endpoints (2) alpha1= 0.02 for the 1st endpoint and adjusted alpha2= adjusted alpha3

  12. 3 Endpoint Case: closed testing using Simes test Simes test at level 0.05 using all endpoints Y1, Y2 and Y3 with correction factor C C=1, test conservative for high endpoint correlation If Reject Simes test w. C Y1, Y2 Simes test w. C Y1, Y3 Simes test w. C Y2, Y3 If Reject Endpoint Y2 P > 0.05 Endpoint Y3 P > 0.05 Endpoint Y1 P < 0.05

  13. Correction factor C for the Simes test, K=3 Test statistics Z1, Z2 and Z3 follow 3-variable normal distribution αlpha = 0.05, 2-sided tests r Type I Adjustment Type I Error rate Factor C Error Rate 0.0 0.05 1 0.05 0.3 0.0489 1.02200 0.05 0.5 0.0468 1.07202 0.05 0.7 0.0430 1.17916 0.05 0.8 0.0403 1.27227 0.05 0.9 0.0374 1.40980 0.05 Effectiveness in at least one endpoint, if p(3) < 0.05, or { P(3)  0.05, P(2) < 0.05*2/3*C}, or { P(3)  0.05, P(2)  0.05*2/3*C, P(1) < .05*1/3*C}.

  14. Case of Dependent Event Rate Endpoints Dependence parameter  can be estimated as follows: Y= hospitalization endpoint x=1, y =0 p10 x=1, y =1 p11 p X= mortality endpoint x=0, y =1 p01 x=0, y =0 p00 q p’ q’ Dependence parameter  = p11/ (pqp’q’) • Approximate test statistics for the proportions are bivariate normal • in the limit with the above dependence parameter • Previous methods for the continuous endpoints apply

  15. TCH (Tukey-Ciminera-Heyse, 1985) and Dubey (1985) tests (K >3) • TCH method(highly correlated endpoints, 1985) Adjusted alpha = 1- (1-alpha) 1/sqrt (K) • Dubey (1985) [Armitage-Parmar (1985-86)] Adjusted alpha = 1- (1-alpha) 1/mi mi = K (1- r.i), (i = 1, …, K), r.i=average of (K-1) correlation coefficients (ith endpoint vs. the other K-1 endpoints) • Recent modifications of the Dubey method for proper protection of the type I error rate

  16. Modifications of the Dubey’s method First step - correlation matrix conversion • Convert correlation rij to corr ((|Zi|, (|Zj|), Zi and Zj follow standard 2-variable normal distribution w. correlation coefficient rij r = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9) converts to (0.00609, 0.02264, 0.05641, 0.10282, 0.16608, 0.24980, 0.35936, 0.50400, 0.70109)

  17. Modifications of the Dubey procedure • Modification 1 (M1): Let the new correlation matrix be R. Scale R by R’ = Rf (f = 1.5 when K = 4). Next follow the Dubey procedure with this new scaled R’. • Modification 2 (M2): Using R obtain R-square value between the endpoint i ( =1, …, K) and the remaining (K-1) endpoints. Multiply this R-square value by g (g = 0.75 when K =4). Then use this R-square value in place of the average correlation in the Dubey procedure.

  18. Performance of the ad hoc procedures for K=4for some correlation structures R = {r12, r13, r14, r23, r24, r34} R1 = {.9 (3), .8 (2), .3 } all v.high -one low (Avg 7.7) R2 = {.8 (2), .5(2), .3 (2) } 2 v.high, 2 medium, 2 low (5.3) R3 = {.7 (3), .5(2), .1 } 3 high, 2 medium, 1 v.low ( 5.3) R4 = {.8, .7, .3 (2), .1 (2)} 1 v.high, 1 high, 2 low, 1 v.low ( 3.7) R5 = {. 8 , .5, .3, .1 (3)} 1 high, 1 medium, 1 low, 1 v.low (3.2) R6 = {.5 (2), .4, .3 (2), .1} 3 medium, 2 low, 1 v.low ( 3.5) R7 = (. 5(2), .4, .1 (3)} 3 medium, 3 v.low ( 2.8) R8 = (.2 (3). .1 (3)} all v.low ( 1.5)

  19. Performance (1) of ad hoc procedures for K=4for selected correlation structures R1-R8Nominal alpha =0.05, 2-sided tests using normal Z-statistics MH1 MH2 MH2 MH2 R TCH Dubey f =1 f=1.5 g =1 g=.75 Simes Sidak ==================================================== R1 .056 .084 .062 .053 (2) .059 .0 49 .037 .028 R2 .076.079 .055 .049 .057 .052 .044 .041 R3 .077 .083 .055 .047 .050 .047 .043 .040 R4 .081 .070 .052 .048 .055 .051 .045 .043 R5 .085 .067 .054 .050 .055 .052 .046 .044 R6 .088 .073 .052 .048 .048 .047 .047 .046 R7 .090 .069 .052 .049 .050 .049 .048 .048 R8 .097 .060 .051 .051 .050 .050 .050 .050 ===================================================== • Based on 100,000 clinical trial simulations • Entry = 0.050 with f = 1.7

  20. Some comments on the results of the previous table • Investigations limited to selected correlation structures for K = 4 • Tukey’s adjustment – for highly correlated endpoints • Dubey’s – fairly stable, but liberal in protecting alpha-level • Mofication M2 (g =.75) performs well • The approach sensitive to the choice of metric and scaling factor • Simes and Sidak methods quite conservative for moderate to high correlated endpoints

  21. Properties of the Modifications M1 and M2 Under Investigation: • Type I error rate control for K in the range 4 - 10 • Strong control of the familywise type I error rate using closed testing principle • Simultaneous confidence interval properties • Power properties

  22. O’Brien’s OLS/GLS t-tests, 1984 (K > 3) These tests are based on weighted sums of the K standardized endpoints using weights (w1, w2, …, wK) = JT R-1 for the GLS test and = JT for the OLS test. In other words, GLS method give more weights to endpoints not highly correlated and the OLS method gives equal weight to all endpoints. • Test sensitive under homogeneity of treatment effects and low correlation across endpoints • Performs poorly under treatment by endpoint interaction • Closed testing for endpoint specific results

  23. Issue #2 • Use of sequential analysis of endpoints is increasingly becoming popular. How to reconcile some of the difficulties it poses? Suppose that the sequence breaks, and the subsequent endpoint has an extremely low value. How avoid this situation?

  24. An example of a sequence break when testing endpoints sequentially • Consider a heart failure trial with two endpoint y1=exercise tolerance and y2= mortality rate. The trial had a predefined sequential test strategy. • Test for y1 first at level 0.05 (2-sided). If this endpoint has a statistically significant result at this level, then and only then test for y2 at the same level 0.05, otherwise declare the trial as failure. • Difficult Case! p1 > 0.05, p2 =0.001.

  25. A proposed test strategy • Predefine 1 and 2 so that  = 1 +2 e.g., 1 = 0.04 and 2 = 0.01. • Test y1 first at level 1. • (a) If p1  1, then reject H01 and then • test y2 at level  (i.e.,  =.05, and not at level 2) • (b) If p1 > 1, then do not reject H01, but • test y2 at level 2 This test strategy controls the familywise type I error rate at level  (e.g.,  =0.05)

  26. Concluding Remarks • Understanding of relationships between endpoints helps in selecting an efficient test strategy for multiple endpoints • Methods that account for correlation between endpoints are fairly straightforward for K=2, 3 • Ad hoc procedures such as M1 and M2 modifications of the Dubey’s procedure can be helpful in testing for K > 3. Also bootstrap and O’Brien’s methods can be applied • Sequential testing can be done slightly differently to accommodate sequence breaks with extreme subsequent p-values

More Related