U.S. Food and Drug Administration

U.S. Food and Drug Administration Notice: Archived Document The content in this document is provided on the FDA’s website for reference purposes only. It was current when produced, but is no longer maintained and may be outdated.

Some Comments Brent A. Blumenstein, PhD TriaArcConsulting.com

Q1: Single Versus Dual Endpoints • Domains mentioned: pain, function. • Discussion assumes two domains, but can be generalized to more. • Two approaches: • Dual endpoints, each covering a domain. • One endpoint integrating the two domains.

Dual Endpoints • Primary overall success is defined by two hypotheses (stated as alternative hypotheses): • Endpoint 1 shows effect. • Endpoint 2 shows effect. • Two possible conjunctions for the two alternative hypotheses: • OR: • Overall α must be shared. Example: overall α = 0.05 (2-sided), then criterion for each hypothesis must be α/2 (2-sided). • Likely not acceptable in this context. • AND: • Overall α need not be shared. Example: overall α = 0.05 (2-sided), then criterion for each hypothesis can be α (2-sided).

Integrated Domains Endpoint • Integrated endpoint requires weighting of separate domains: • Example: Suppose two assessments (PROs or objective) or two subscales of a PRO instrument each of which putatively covers one of the domains of interest. Suppose scores or outcomes S1 and S2. Integrated single score S = W1S1 + W2S2 where the Ws are weights. Would be naïve to choose W1 = W2 =1. • Example: PRO instrument Z is claimed to cover pain and function. Scoring of instrument Z intrinsically weighs the domains. This was built-in by developer of instrument Z. • Advantage: One primary hypothesis. PRO = Patient-Reported Outcome

> 2 Endpoints • Success on k of n. • Rank methods for integration (O’Brien 1984). • Hierarchical rules. • Weighting still an issue. • Very complicated.

Serial Assessments • AKA longitudinal analysis, repeated measures. • Longitudinal analysis: • Pre-specify hypothesis test. • Missing data issues. • Random effects may increase sensitivity. • Summary measure: • Pre-specify a score for each patient, such as, maximum, slope, … • May have less missing data issues. • May need validation. • Missing data may threaten ITT analysis. ITT = Intent-To-Treat

Responder Analysis • Convert serial measures of multiple domains (scores or assessments) into a dichotomous outcome: “success” or “failure to observe success” at pre-specified time. • Multiple scores or assessments with thresholds and conjunctions (and/or); definition of “success” must unambiguously indicate clinical benefit. • ITT is easier to implement (no missing data). • Generally less statistical sensitivity. • Often difficult to know reference data for a responder outcome.

Time to First “Bad Thing” • Requires careful specification of the set of “bad things”. • Based on time to failure analysis methods. • Can reduce impact of missing data because of censoring . • Can be more sensitive than responder analysis. • Will require sensitivity analyses to assess robustness of results. • Requires careful adherence to assessment schedules. • May need look to “cure” models for planning.

Q2: Non-inferiority Design Issues • Many issues. • Inherently non-conservative. • No internal validity. • Extra care in trial conduct. • Some statistical methods may not perform optimally under inferiority offsets.

Reference Data Issues • Non-inferiority must be referenced to an outcome value R that unambiguously represents a benefit (else “non-inferiority to nothing”). • Some sense of the variability of R must be established. • Often formal meta-analyses are used to establish R and its variability. • Drift from historical efficacy levels must be taken into account.

Inferiority Margin • Outcome that is regarded as implying unacceptable inferiority must be identified. • Suppose reference outcome R, and outcomes equal or less than R –  would be regarded as inferiority.  is the inferiority margin. Alternatively, inferiority can be identified on ratio scale as (1- )R. Etc. • In general should be smaller than superiority margin generally used. • Smaller s mean larger trial sizes.

Identifying Non-Inferiority Hypothesis • Two variables: R and . • Clinical expertise. • Meta-analysis. • Differences between proportions should not be used, instead ratios or odds ratios. • … • Discuss with agency!

Non-Inferiority Analysis Sets • ITT may be anti-conservatively impacted by assuming “no response” in control group as a result of missing data.

Single-Arm Trials • Requires well-characterized reference. • Eligibility criteria and trial intake must match sources of reference data. • Larger (smaller) superiority (inferiority) margin must be specified because of lack of internal control. • Failure-time endpoints likely not acceptable. • Endpoint prognosis cannot be apparent at intake. • Change scores might be a possibility, and puts burden on validity of assessments. • Likely too many issues to be acceptable to agency and scientific community.

U.S. Food and Drug Administration