490 likes | 658 Views
Emerging Issues and Considerations in Clinical Trial Design Strategy for Drug Development * H.M. James Hung 1 , Sue-Jane Wang 2 , Robert O’Neill 3 1 DB1/OB/OPaSS/CDER/FDA 2 DB2/OB/OPaSS/CDER/FDA 3 OB/OPaSS/CDER/FDA
E N D
Emerging Issues and Considerations in Clinical Trial Design Strategy for Drug Development* H.M. James Hung1, Sue-Jane Wang2, Robert O’Neill3 1DB1/OB/OPaSS/CDER/FDA 2DB2/OB/OPaSS/CDER/FDA 3OB/OPaSS/CDER/FDA Present at the 11th Annual BASS Symposium, Savannah, Georgia, November 3, 2004 * The views presented here are not necessarily of U.S. Food and Drug Administration
Outline • Background • Emerging Issues & Needed Considerations • - sample size planning • - dropout and missing value problems • - non-inferiority trial • - adaptive/flexible design • - design of trials in trials • - pediatrics trial • - modeling & simulation and Bayesian method • Remarks H.M.J. Hung, 2004 BASS
Background Recent statistics indicate that Phase III trial has a high failure rate. The critical path initiative launched by FDA includes searching for ways to enhance drug development process for obtaining knowledge of drug efficacy and safety. Where can statisticians contribute? H.M.J. Hung, 2004 BASS
Traditional Clinical Trial Program • Ultimate goals: • - Obtain two positive trials to demonstrate efficacy of the test drug at some dosing regimen (hopefully indicated for a wide population) • Predict it is safe for use. • Pre-marketing safety data is usually insufficient • to demonstrate the drug at that dosing regimen • is safe. • Post-marketing data might help to suggest • whether safety can be a problem H.M.J. Hung, 2004 BASS
Common Practice • Key steps: Learn vs. Confirm • Perform Phase I/II trials to explore and learn • - little restriction on study design • Perform Phase III trials to confirm • - Simple design is preferred • Interim look of data discouraged unless necessary • - Pre-specification of design elements is highly • recommended • Avoid data dredging (selection bias, multiplicity) • Need to control familywise type I error rate • (often strong control is recommended) H.M.J. Hung, 2004 BASS
However …… • Phase I/II trials (mostly small) provide limited • information for planning Phase III trials • - efficacy variables can be hard to select • - effect sizes can be hard to postulate • false positives loosely considered • little safety data on human use • have little chance to explore response profiles of • dropouts H.M.J. Hung, 2004 BASS
Phase III trial • “Educated” guess from Phase I/II data becomes • little informative on many design elements, e.g., • what endpoint(s) to choose as primary? • what dose(s) to study? • what effect size to detect? • little chance to classify whether dropouts are • informative or not, whether missing values are • ‘non-ignorable’, etc. • little room for adaptive inference, e.g., • Is there an objective rule by which p = 0.10 can be • considered ‘positive’ in some cases? H.M.J. Hung, 2004 BASS
Emerging Issues & Needed Considerations Sample Size Planning What is (standardized) effect size likely to be? How to postulate the effect size? Should sample size re-estimation be considered during the course of the trial? If yes, how? based on blinded assessment or unblinded assessment of effect size? H.M.J. Hung, 2004 BASS
y1i N(1 , 2), i = 1, …, n y2i N(2 , 2), i = 1, …, n = 1 - 2 , 2 = 1 (assumed for ease of presentation) Test H0: = 0 vs. H1: > 0 with two-sample Z statistic To detect = at sig. level and power 1-, n = 2(z+z)2/2 H.M.J. Hung, 2004 BASS
Suppose is minimum clinically significant effect. Plan sample size to detect = at 1-sided and power 1-. Final analysis gives d: effect size estimate p: 1-sided p-value associated with d Prove: d = [zp / (z + z)] Ex: = 0.025, p = d = 0.60 for the study with 90% power d = 0.70 forthe study with 80% power Hung and O’Neill (2003) H.M.J. Hung, 2004 BASS
d p ( -z- z ) 1- ( -z- z ) ----------------------------------------- 0.05 0.90 0.0017 0.80 0.0065 0.025 0.90 0.00059 0.80 0.0025 0.01 0.90 0.00015 0.80 0.00077 Hung and O’Neill (2003) H.M.J. Hung, 2004 BASS
Q: Is the observed effect that is less than the minimum clinically significant effect acceptable? Demanding “observed effect no less than minimum clinically significant effect” means demanding “the 1-sided p-value no larger than 0.00059” in the trial planned for detecting this minimum effect with 1-sided alpha of 2.5% and power 90%. Q: shouldn’t we require that the 95% CI for rule out when is minimum clinically significant effect? H.M.J. Hung, 2004 BASS
Wang, Hung, O’Neill (2004) Investigated uncertainty in using effect size estimates from Phase II trial for planning sample size of Phase III trial - Ifthe point estimateof the effect size is used to estimate sample size of Phase III trial,the power of the launched Phase III trial is very likely to be low. H.M.J. Hung, 2004 BASS
- Ifthe worst limitof 95% CI of the estimated effect size is used, theaverage power of Phase III trial can be close to or higher than the targeted power level but probabilityof not launching Phase III trial can be very high,which is undesirable for an effective treatment • Q: What should be used to postulate effect size? • Model effect size as a random variable?(e.g., • Lee & Zelen, 2000, Westfall, Krishen, Young, 1998) H.M.J. Hung, 2004 BASS
% trials not launched (based on lower limit) Power (lower limit) Power (point est) % trials not launched (based on point est.) Effect sizes are equal to 0.3 for Phase II and III trials. The trial with the estimated effect size < 0.2 will not be launched. H.M.J. Hung, 2004 BASS
Hung and O’Neill (2003) Given a significant p-value with =d in one trial, probability of replicating ‘P < 0.025’ (if =d) in another identical study is( zp- z0.025 ). p prob. of replication 0.025 50% 0.005 73% 0.0005 86% H.M.J. Hung, 2004 BASS
Given a significant p-value with =d in a study of size N , one can calculate sample size Mneeded to have 1- probability of replicating ‘P < ’ (if =d) in another study M = N {(z+ z)/zp}2 H.M.J. Hung, 2004 BASS
p 1- M/N p 1- M/N --------------------- ------------------- .0005 .7 0.57 .025 .7 1.61 .8 0.72 .8 2.04 .9 0.97 .9 2.74 .005 .7 0.93 .8 1.18 = 0.025 .9 1.58 H.M.J. Hung, 2004 BASS
Sample size re-estimation • Blinded re-estimation • EM procedure may have a problem[Friede & • Kieser (2002)] • Assessment based on interim data • Q: At an interim time of a trial, if the observed treatment difference suggests to test a smaller effect size (worthwhile), can we increase sample size? why or why not? • If yes, what are the best approaches for valid statistical testing? How to do in logistics? H.M.J. Hung, 2004 BASS
Dropout and Missing Value Problems • Pre-specify primary analysis and sensitivity • analysis in anticipation of non-ignorable missing • data[O’Neill (2003, 2004)] • How? at least need to know what types of • missing data are non-ignorable • Difficult to explore whether missing mechanisms • are ignorable or not, based on a single trial • (each cohort of dropout by reason or by dropout • time is small).This type of exploration should • be done with external historical trials for the • same drug class and/or for the same disease. H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Provided by Yeh-Fong Chen H.M.J. Hung, 2004 BASS
Seek ways to minimize dropout • Seek alternative designs (e.g., enrichment • design*) to narrow the study population when • the dropout rate is high and the results of • intent-to-treat analysis may not be interpretable • (caveat: generalizability of interpretation). • *Temple (1994-2004) H.M.J. Hung, 2004 BASS
Seek missing mechanism model to help • imputation. • This needs to use knowledge of disease process# • (how? clinical trial modeling and simulation? • Need to get practical experiences) • The model needs to be flexible for • sensitivity/robustness analysis. • Caveat: such model is not verifiable • # Unnebrink and Windeler (2001) H.M.J. Hung, 2004 BASS
Non-inferiority Trial • Difficulty arises from active controlled trials without a placebo arm. • Assay sensitivity and constancy assumption • rely totally on subjective judgment to assess. • No data can be used to verify these • assumptions. H.M.J. Hung, 2004 BASS
- Study objective of NI testing? show ‘not much worse than’? - need a clinical significance threshold (margin) (subjective judgment is a key) show ‘retaining X% of control effect’? - need to specify X (also rely on subjective judgment) show efficacy? - show ‘retaining X% control effect’ to protect against the uncertainty of constancy assumption - Medical colleagues need to see a fixed margin. H.M.J. Hung, 2004 BASS
- Statistical alpha error Classical alpha error (i.e., repeating only NI trials infinitely often) cannot quantify the alpha error associated with any assertion referencing to placebo, e.g., what is the probability of mistakenly asserting that the test drug is efficacious (relative to the placebo)? Don’t know. Cross-trial alpha error (i.e., repeating NI trials and historical trials infinitely often) or other measure of statistical uncertainty for cross-trial statistical inference is required. H.M.J. Hung, 2004 BASS
Asserting percent retention • No fixed margin (depends only on historical data) • can make the cross-trial alpha level of any • statistical test attain exactly at the desired level • (say, 0.025). • The fixed margin that makes the statistical test • always valid (1-sided cross-trial alpha < 0.025) • regardless of # of deaths (or sample size) is • use of worst limit of 95% CI to define the margin. • Q: What should the fixed margin be to test for • retention of X% control’s effect? H.M.J. Hung, 2004 BASS
Synthesis test* and ratio test# are very sensitive • to constancy assumption (unverifiable). • Can be very problematic (when control’s effect • is smaller in NI trial than in historical trials). • Cannot be translated into “CI vs. a fixed margin”. • Q: When can synthesis test or ratio test be used? • - historical data about the control are rich and • the control effects are ‘consistent’? • How to define ‘consistent’? • *Holmgren (1999), Hasselblad and Kong (2001, retaining 0%) • Wang, Hung, Tsong (2002), Wang, Hung (2003), Snapinn (2001, 2004) • Hung, Wang, Tsong, Lawrence, O’Neill (2003) • Rothmann, Li, Gang, Chi, Temple, Tsou (2003) • #Hasselblad and Kong (2001), Fisher (2004), Wang, Chen, Chi (2004) H.M.J. Hung, 2004 BASS
- Level of statistical evidence Q: For efficacy, shouldn’t we ask for at least two NI trials? There is no ethical reason why a second trial cannot be done. H.M.J. Hung, 2004 BASS
Non-inferiority testing should be avoided when • historical data are thin for estimating the effect of selected active control, e.g., • only one historical trial (variability understated) • no convincing evidence for the control’s effect • constancy assumption is very doubtful • The control’s effect in the historical trial • population is suspected to worsen in the • non-inferiority trial population. • No data is available to estimate the extent of • worsening. H.M.J. Hung, 2004 BASS
Adaptive/Flexible Design Ex. Surrogate markers are used in early stages for predicting drug effect or changing design elements in later stages, e.g., - changing clinical endpoints - dropping or adding treatment arms - re-estimating sample size - enriching the patient population Clinical benefits are assessed at the end of the trial whose design specifications may have been modified. H.M.J. Hung, 2004 BASS
Q: Can clinical endpoint data from exploratory stages be combined with Phase III data in analysis for confirmatory purpose? • If yes, how to proceed in analysis? • Weighted combination Z statistic and the • corresponding point estimate and CI can be • useful tools. • Need to prespecify the weight • Concept of overall type I error can be unclear H.M.J. Hung, 2004 BASS
- Issue with number of hypotheses to explore in Phase II (pertaining to statistical control of total type I errors)? No (or ignored) if Phases II and III are not combined Yes (needing attention) in Phases II and III are combined This makes little sense. [Hung, Wang, O’Neill (2004)] How to proceed in logistics? H.M.J. Hung, 2004 BASS
Design of Trials in Trials • Ex. A trial is designed to study mortality endpoint. • It is then divided into two subtrials to study a non-mortal endpoint (e.g., quality of life). • alpha allocation • replication concept • complication with interim analysis • p-value adjustment for the non-mortal endpoint • Moye and Deswal (2001) H.M.J. Hung, 2004 BASS
Pediatrics Trial Often need to bridge adult trial info to pediatric trial. Q: How to design/analyze a bridging clinical trial? Weighted combination Z test might offer a solution [Lan (2004)] Without ‘practical’ design methodology, people may resort to ‘prediction’ based on modeling and simulation. This may be too big a leap of faith. H.M.J. Hung, 2004 BASS
Modeling & Simulation and Bayesian Method • Modeling & Simulation • Use historical data (for a drug class) and disease • process model (if any) as prior information with the computer tools • explore missing value pattern and mechanism • explore response profile of dropouts • explore relationship between clinical endpoints • explore the impact on statistical error rate due • to mid-course adaptation of trial design • and others…. H.M.J. Hung, 2004 BASS
Bayesian Method • May potentially help enhance classical trial design by integrating the external data or • historical data to • postulate effect size • compare several competing designs • (e.g., adaptive design vs. standard design) • Q: How? • Q: Can posterior probability be used to describe the likelihood of a particular alternative parameter space (as a supplementary indicator for p-value)? H.M.J. Hung, 2004 BASS
Q: How to assess frequentist’s operating characteristics of Bayesian method? • Effect parameter and nuisance parameter are • random variables • Alpha depends on the distributions of • effect parameter and nuisance parameters • - Average alpha over the parameter spaces is not • sufficient, neither is average power • Need to control maximum alpha over the • parameter spaces • Q: What is ‘non-informative’ prior for use in confirmatory Phase III trial? H.M.J. Hung, 2004 BASS
Remarks • Ethics, cost and risk considerations impose • more limitation to standard trial designs. • Utility of alternative designs need to be fully explored in terms of statistical inference and logistics. • flexible/adaptive design • non-inferiority trial design • other complex design (e.g., bridging study • design) H.M.J. Hung, 2004 BASS
Utility of external/historical information needs to be fully explored • Need to search for ways to better quantify and • use statistical information (data mining? Meta • analysis?) • Explore the nature of missing values • Seek ways to make better “educated guess” of • effect size and/or for other design specifications • (e.g., which composite endpoint to choose as • primary?) • Explore the performance of alternative designs • versus standard designs H.M.J. Hung, 2004 BASS
Logistical concerns: unquantifiable bias that make any analysis results not interpretable. • Should not become a fear factor impeding usage • of alternative design • Need to be laid out and seek ways to deal with • them properly • Alternative designs demand more careful trial planning, not sloppy planning. H.M.J. Hung, 2004 BASS
Efficiency of any trial design should be evaluated based on the entire drug development program, not per trial. • Traditional statistical efficiency per trial is • insufficient • Other efficiency criteria [Liu (2002-2003), • Mehta (2004)] • Safety evaluation may need a large amount of • data and comparative trials are desired. H.M.J. Hung, 2004 BASS
Selected References Hung & O’Neill (2003, Biometrical Journal) Wang, Hung, O’Neill (2004, ASA Proceedings) O’Neill (2003, 2004, ASA and DIA talks) Temple (1994-2004, talks, publications, lecture notes) Friede & Kieser (2002, Stat. in Med.) Unnebrink and Windeler (2001, Stat. in Med.) Lee & Zelen (2000, Statistical Science) Westfall, Krishen, Young (1998, Stat. in Med.) Holmgren (1999, J. of Biopharmaceutical Statistics) Hasselblad and Kong (2001, Drug Information Journal) Wang, Hung, Tsong (2002, Controlled Clinical Trials) Wang, Hung (2003, Controlled Clinical Trials) Snapinn (2004, J. of Biopharmaceutical Statistics) Hung, Wang, Tsong, Lawrence, O’Neill (2003, Stat. in Med.) H.M.J. Hung, 2004 BASS
Rothmann, Li, Gang, Chi, Temple, Tsou (2003, Stat. in Med.) Fisher (2004, EXANTA AC meeting) Wang, Chen, Chi (2004, under review) Lan (2004, under review) Liu (2003-2004, ASA talk, other talks) Mehta (2004, Talk in FDA-MIT Workshop on Adaptive Clinical Trial Design) Hung, Wang, O’Neill (2004, ASA talk) Moyé & Deswal (2001, Controlled Clinical Trials) H.M.J. Hung, 2004 BASS