400 likes | 584 Views
Choice of Endpoints for Salvage Studies. Choice of Endpoints. Clinical Endpoints AIDS-defining events Survival QOL Marker-based Endpoints for Efficacy HIV-1 RNA CD4. Choice of Endpoints (Cont.). Endpoints for Toxicity Time to treatment discontinuation
E N D
Choice of Endpoints Clinical Endpoints AIDS-defining events Survival QOL Marker-based Endpoints for Efficacy HIV-1 RNA CD4
Choice of Endpoints (Cont.) Endpoints for Toxicity Time to treatment discontinuation Targeted adverse events (e.g. lipodystrophy) Composite Endpoint Combine information across different endpoint categories. Time to treatment discontinuation for virological failure or intolerance.
HIV RNA Endpoints Quantitative (change from baseline to Week x) Time to Virological Failure Binary Cross-sectional; e.g. Above/below threshold at week x Failed by Week x
Cross-Sectional vs. Failure Over Time Above/Below threshold at week x Snapshot; not affected by transitional changes in HIV levels. Frequent monitoring not required (batch assaying). Missing data at timepoint especially problematic. Failure Endpoints Assessment of response over time; may be affected by transitional changes in HIV levels. Frequent monitoring required (real time assaying). Missing data strategies need to be defined/evaluated.
Time to Failure vs. Cumulative Proportion Time to Failure Patterns of failure depend on failure time (assumptions). Can be evaluated within an interim analysis (accommodates differential follow-up). Cumulative Proportion Time to failure not considered in analysis. Evaluation with interim analysis may be complicated.
Power Advanatges of Time to Event • If the pooled failure rate is > 50%, a time-to-event endpoint has appreciable sample size advantages • Example: 6 months accrual, 1 year additional follow-up, 2 arm trial: • e.g., 6 months accrual, 1 year additional follow-up, 2 arm trial: • 50% pooled failure rate, 5% sample size savings • 70% pooled failure rate, 15% sample size savings • e.g., 1 year accrual, 6 months additional follow-up, 2 arm trial: • 50% pooled failure rate, 12% sample size savings • 70% pooled failure rate, 25% sample size savings
Analysis Issues • With moderate study withdrawal, the sample size savings of the time-to-event endpoint increases further. • The sample size savings are larger at interim analyses than at final analyses, in proportion to the fraction of subjects who have less follow-up time than the specified interim analysis time. • Time-to-event endpoints also have advantages for evaluating covariate effects and for flexibility in extending the study by prolonging the follow-up period.
Purely Virologic vs. Composite Purely Virologic Focuses on virologic response only tolerability and safety can be assessed separately Follow-up for viral load is essential after treatment discontinuation. Composite Combines virologic efficacy, tolerability and safety; overall picture. May differ substantially from purely virologic if toxicity rate is high. Purely virologic should be done as secondary endpoint.
Issues in Definition of: Virologic Failure Early failure (rise above nadir/baseline, insufficient decline) Amount of time allowed to go below suppression threshold Choice of threshold for suppression and for loss of suppression Fluctuations due to treatment holds, intercurrent illness, etc. Regimen Completion Virologic failure definition (see above) Number of drugs added/changed before declare treatment failure Subjectivity of treatment discontinuation reasons
Clinical Beliefs Underlying the Appropriate Use of Each Endpoint Purely Virologic Endpoint: The effect of the investigated therapies on plasma HIV-1 RNA levels captures the essential information needed to define the role of the therapies in clinical practice for the target population. Regimen completion endpoint: The necessity to change regimens more closely measures tangible benefit to a patient than does virological failure alone, and, assessing the virologic effect of treatment is unnecessary.
Types of Study Endpoint in HIV Disease Studies Time to Failure Regimen Completion (384, 372A, A5025) Virologic Failure Week x (388, A5076, A5095) Binary Below Threshold at Week x (359, 364, 370, 373, A5086) Not Fail by Week x: failure is defined as: Rise Above Threshold (A5073) Rise Above Threshold, Early Failure (347, 368, 398, A5080) Rise Above Threshold, Early Failure, Off Treatment (372B, 400, A5064) Cumulative Virologic Failure (343)
Composite Endpoints Combine efficacy and toxicity information (e.g. time to Rx discontinuation) Will be more numerous than pure virologic endpoints, but may dilute the effect of treatment. Especially a concern if Rx discontinuation may be unrelated to Rx (pregnancy, imprisonment, moving).
Example Suppose effect of Rx A (compared to B) reduces percentage reaching event from 35% to 17.5%. We need 100 patients per arm to have 80% power. Assume Rx discontinuation rate is 10%/yr for both treatments, and is included in endpoint definition. We have more endpoints but only 60% power to detect the treatment difference. We need 50 additional patients per arm for 80% power.
Example Continued “Pure” Failure 100 Evaluable Patients Failure including Rx Discontinuation 100 Evaluable Patients
ACTG 359:Proportion vs. Change ACTG 359 is a randomized, partially double-blinded, multicenter factorial study of six oral combination antiretroviral regimens: DLV - RD RTV ADV - RA DLV + ADV - RDA SQV DLV - ND NFV ADV - NA DLV + ADV - NDA Subjects received randomized study treatment for 24 weeks
Data Completeness Data Descriptions Above 90% of subjects had week 16 virologic and immunologic data. # of subjects with missing RNA data at week 16. Treatment RD RA RDA ND NA NDA n 5 3 6 5 1 3 Data were assumed to be missing at random.
Primary Efficacy Comparison Proportions of HIV-RNA below 500 at week 16 RTV NFV 28% (35/125) 33% (42/127) P = 0.513, Fisher’s exact test DLV ADV DLV + ADV 40% (34/85) 18% (16/88) 33% (17/79) P = 0.006, Chi-square test
Secondary Efficacy Analysis: RNA Change HIV RNA week 16 median change from baseline Treatment RD RA RDA ND NA NDA in log10 -0.41 -0.16 -0.21 -0.61 -0.08 -0.05 RTV vs. NFV: p = 0.834 (Logrank), p = 0.586 (Prentice-Wilcoxon) DLV vs. ADV: p = 0.003, p = 0.011; DLV vs. DLV + ADV: p = 0.262, p = 0.231; ADV vs. DLV + ADV: p = 0.104, p = 0.258.
Loss to Follow-Up Need a policy for handling loss to follow-up Drop-out as censored/failure may be biased Sensitivity analyses with various levels of association between drop-out and failure events
ACTG 398 Subjects were stratified for prior PI (protease inhibitor) exposure, by selective randomization to one of four treatment arms: SQV Arm: Amprenavir (APV) + Saquinavir (SQVsgc) + Abacavir (ABC) + Efavirenz (EFV) + Adefovir (ADV) IDV Arm: APV + Indinavir (IDV) + ABC + EFV + ADV NFV Arm: APV + Nelnavir (NFV) + ABC + EFV + ADV Placebo Arm: APV + Placebo (matched to SQVsgc, IDV or NFV)+ ABC + EFV + ADV
ACTG 398 Continued Design and Ideal Enrollment Arms Prior PI Exposure SQV IDV NFV Placebo Total SQV only X 25 25 15 65 IDV/RTV only 25 X 25 15 65 NFV only 25 25 X 15 65 NFV and IDV/RTV 33 X X 22 55 NFV and SQV X 33 X 22 55 SQV and IDV/RTV X X 33 22 55 NFV, SQV and IDV/RTV 17 17 17 17 68 Total 100 100 100 128 428
ACTG 398 Continued Estimated Virologic Failure at Week 24 for MAR and M=F (Kaplan-Meier) Treatment NNRTI M=F MAR Arm Experienced? Failure (95%CI) Failure (95%CI) SQV Yes 0.85 (0.74, 0.95) 0.76 (0.62, 0.90) No 0.54 (0.43, 0.66) 0.41 (0.29, 0.53) IDV Yes 0.87 (0.75, 0.99) 0.80 (0.66, 0.94) No 0.53 (0.37, 0.69) 0.42 (0.26, 0.59) NFV Yes 0.73 (0.62, 0.83) 0.66 (0.54, 0.77) No 0.55 (0.43, 0.67) 0.48 (0.36, 0.60) Placebo Yes 0.91 (0.83, 0.98) 0.82 (0.73, 0.92) No 0.63 (0.54, 0.73) 0.52 (0.42, 0.63)
ACTG 398 Continued Primary Comparison of Treatment Arms vs. Placebo P-values for RNA < 200 copies/ml at Week 24 SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo 0.25 0.29 0.004 0.002 Results based on the exact test with stratification by prior PI and NNRTI experience.
P-values for Confirmed Virologic Failure at/before Week 24 SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo M=F 0.62 1.00 0.005 0.026 MAR 0.74 0.83 0.005 0.038 Notes: Results based on the exact test with stratification by prior PI and NNRTI experience. MAR = Missing-at-random (missing RNA samples ignored) P-values for Time to Confirmed Virologic Failure SQV vs Placebo IDV vs Placebo NFV vs Placebo SQV/IDV/NFV vs Placebo M=F 0.78 0.83 0.006 0.040 MAR 0.98 0.70 0.003 0.038 Notes: Results based on the stratified log-rank test with stratification by prior PI and NNRTI experience. MAR = Missing-at-random (missing RNA samples ignored) ACTG 398 Continued
Analysis of Quantitative Endpoints Censored data methods required (log-rank, Prentice log-rank Bias results from excluding missing data Lost observations carried forward can be very biased Consider last rank carried forward for rank-based analysis
Discussion Points • Count study withdrawal as failure or as censored? • each analysis is likely biased • recommend carrying out both analyses as well as more sophisticated sensitivity analyses
Discussion Points • What are the criteria for selecting a primary endpoint? • Optimally addresses the primary objective, taking into account the patient population and the study drugs • Within the pool of possible surrogate markers, it is maximally accurate as a replacement for true clinical endpoints
Analysis Points • If the primary endpoint is binary, the Chi-squared test and Fisher’s exact test for a treatment difference are biased if there are censored data • A Z-test based on the difference in Kaplan-Meier estimates of the proportion failed is unbiased and efficient • use this test routinely