450 likes | 728 Views
Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Sep. 16, 2005. Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 * No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred. Acknowledgements.
E N D
Non-randomized Medical Device Clinical Studies: A Regulatory PerspectiveSep. 16, 2005 Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 *No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred.
Acknowledgements Thanks to my colleagues in the Cardiovascular and Ophthalmic Devices Branch, CDRH, FDA, for their help with this presentation.
Outline • Types of non-randomized studies in medical device • Why non-randomized? • Major concerns with non-randomized studies • Conclusions
Type of Non-randomized Study • Concurrent 2-arm non-randomized study • One-arm study: • With comparison to a historical control, where patient-level data of historical control is available and used in treatment comparison; Pseudo 2-arm comparative study • With comparison to a fixed target valueobtained from multiple historical trials; OPC: Objective performance criterion
Why Non-Randomized? • RCT is sometimes not ethical or practical. • Sample size determination based on one-sample hypothesis test smaller sample size? • May save time and money • “Least Burdensome”? • To keep the rapid pace of new technology development.
Example 1 --- Coronary artery bare mental stent • In 1994, two superiority RCTs: compared novel Palmaz-Schatz (P-S)stent with standard balloon angioplasty • Subsequently, randomized non-inferiority trials: compared several new stents with the P-S stent • Over 30 different coronary stents developed for use in USA or Europe over the past 10 years • Design changes: minor modification resulting in small or local effect on patient outcome. • Approx. 1.5 millions patients per year world-wide undergoing the catheter-based coronary treatment • Stent life cycle < 2 yrs, required for randomized trial • Some non-randomized studies were conducted to keep the rapid pace of new stent development.
Major Concerns with Non–randomized Studies • In 2-arm comparative study: • Selection bias and comparability of treatmeent groups • In a pseudo 2-arm comparative study • Is historical active control good? • Bio-creep • Are test and historical control comparable in pat. population? • In one-arm study with OPC • Share all the problems associated with historical controls • Problems with the validity of its determination • Problems with the appropriateness of its use
Comparability of Treatment Groups • In RCT, expect that all patient covariates, measured or unmeasured, are balanced between the two treatment groups. • So, the two treatment groups are comparable and observed treatment difference is an unbiased estimate of true treatment difference. • None of advantages provided by randomized trials is available in non-randomized studies. • A potential problem: Two treatment groupswere not comparablebefore the start of treatment, i.e., not comparable due to imbalanced covariates between two treatment groups. • So, direct treatment comparisons are invalid.
Traditional Adjustments for Covariates • Three common methods of adjusting for confounding covariates: • Matching • Subclassification (stratification) • Regression (Covariate) adjustment
Propensity Score Methodology • Replace the collectionof confounding covariates with one scalar function of these covariates: the propensity score. Age Gender Duration ……. 1 composite covariate: Propensity Score Balancing score
Propensity Score Methodology (cont.) • Propensity score (PS):conditional prob. of receiving treatment A rather than treatment B, given a collection of observed covariates. • Purpose:simultaneously balance many covariates in the two treatment groups and thus reduce the bias. • PS construction: multiple logistic regression model based on patient data of all measured covariates and actual treatment received.
Properties of propensity scores • A group of patients with the same propensity score are equally likely to have been assigned to trt A. • Within a group of patients with the same propensity score, e.g., 0.7, some patients actually got trt A and some got trt B, just as they had been randomly allocated to whichever trt they actually received. “Randomized After the Fact” PS=0.7 Trt A Trt B
When the propensity scores are balanced across two treatment groups, the distribution of all the covariates are balanced in expectation across the two groups. • Use the propensity scores as a diagnostic tool to measure treatment groupcomparability. • If the two treatment groups overlap well enough in terms of the propensity scores, we compare the two treatment groups adjusting for the PS. • Compare treatments adjusting for propensity score • Matching • Subclassification (stratification) • Regression (Covariate) adjustment
Stratification • All patients are sorted by propensity scores. • Divide into equal-sized subclasses. • Compare two trts within each subclass, as in a randomized trial; then estimate overall trt effect as weighted average. • It is intended to use all patients. • But, if trial size is small, some subclass may contain patients from only one treatment group. PS ……. 1 2 5
Example 2 • New vs. control in a non-randomized study • Primary endpoint: MACE incidence rate at 6-month after treatment • Non-inferiority margin: 7%, in this study • Sample size: new: 290, control: 560 • 14 covariates were considered.
Table. Distribution of patients at five strata Subclass Control New Total 1 14228 170 2 12743 170 3 12248 170 4 11951 170 • 50120 170 Total 560 290 850
Baselinecovariate balance checking before and after PS stratification adjustment Meanp-value New Control Before After -------------------------------------------------------------------------------------- Mi 0.25 0.40 <.0001 0.4645 Diab 0.28 0.21 0.0421 0.8608 CCS 2.41 2.75 0.0003 0.3096 Lesleng 11.02 12.16 <.0001 0.5008 Preref 3.00 3.08 0.02020.2556 Presten 62.75 66.81 <.0001 0.4053
Diagnostic check for covariate balance: Percentage of patients with prior Mi
Example 3 • Non-concurrent, two-arm, multi-center study • Control: Medical treatment without device, N=65, hospital record collection • Treatment: Device A, N = 130 • Primaryeffectivenessendpoint: Treatment success • Hypothesis testing: superiority in success rate • 20 imbalanced clinically important baseline covariates, e.g., prior cardiac surgery • 22% patients with missing baseline covariate values
Two treatment groups are not comparable • Imbalance in multiple baseline covariates • Imbalance in the time of enrollment • So, any direct treatment comparisons on the effectiveness endpoint are inappropriate. • And, p-values from direct treatment comparisons are un-interpretable. • What about treatment comparisons adjusting for the imbalanced covariates? • Traditional covariate analysis • Propensity score analysis
Performed propensity score (PS) analysis • Handed missing values • MI: generate multiple data sets for PS analysis • Generate one data set: generalized PS analysis • Others • Included all statistically significant and/or clinically important baseline covariates in PS modeling. • Checked comparability of two treatment groups through estimated propensity score distributions. • Found that the two treatment groups did not overlap well.
Patients in Propensity Score Quintiles 1 2 3 4 5 Total Ctl38 18 8 10 65 (w/time)58% 28% 12% 2% 0% Trt1 21 31 3839 130 1% 16% 24% 29%30%
Patients in Propensity Score Quintiles 1 2 3 4 5 Total Ctl 29 24 8 4 0 65 (w/o time)45% 37% 12% 6% 0% Trt10 14 32 35 39 130 8% 11% 24% 27% 30%
Conclusion: • The two treatment groups did not overlap enough to allow a sensible treatment comparison. • So, any treatment comparisons adjusting for imbalanced covariates are problematic. • Question: Given that the two treatment groups are not comparable, what can we do NOW?
Risks and Dangers of Non-randomized Studies • A study with an historical control may result in much riskier and potentially more burdensome than a RCT. • It may be impossible to predict in advance whether the patient population with the new treatment is comparable to the population for the historical control. • The sponsor must have legal access to the historical data at the patient level and all the right baseline covariates need to have been measured in both groups.
One-arm study with OPC • OPC: Objective Performance Criterion • Introduced to the FDA approx. 10 yrs ago, in the evaluation of prosthetic heart valves • Compared a new heart valve against a fixed number, e.g., a complication rate, obtained from multiple approved heart valve trials by outside experts • Data and guidance for the OPC in public domain • Now, used for some coronary artery stents, e.g., Ho: 6 mo. MACE rate ≥ point estimate + delta Ha: 6 mo. MACE rate <point estimate + delta
Delta: often a clinical call, no FDA guidance • Point estimate: often estimated mean of outcome but currently no universally accepted way • OPC: point estimate, by some people, point estimate + delta, by some others • “One-sample OPC equivalence”, stated by some Question: equivalent to what?To a fixed number!? • “One-sample OPC equivalence” -- inappropriate claim!
Problems with OPC: • Limited good historical data available for the development of OPC • Disregarded variability associated with the estimate in historical studies • Bio-creep problem • Time sensitive • Patient population sensitive • Who is responsible for developing the OPC for a particular device? • Who is responsible for checking if the OPC developed is appropriate? • Who is responsible for updating an existing OPC?
Example 4 • Primary effectiveness endpoint: acute procedure success • Evaluated for the entire SVT patient population SVT = (AVNRT AVRT AF) • OPC = 85% • Hypotheses: • Study results: N = 200, # of successes =164 Observed success rate = 82%(< 85%) C.I.: (76%, 87%) • OPC was not met!
Post-hoc Subgroup Analysis and Claim Acute Success (OPC:85%)
One of major problems with the post-hoc analysis and claim: • The OPC, 85%, was developed for the entire SVT population, not for a particular patient subpopulation • In fact, an OPC for AVNRT, if exists, should be much higher than 85%
Example 5. Weighted OPC • Primary endpoint: one-year adverse event rate • Patient population: Co-morbid group & Anatomic group • Expected event rate: Co-morbid:14%,Anatomic:11% • A common delta,3% • Individual OPC:Co-morbid:17%,Anatomic:14% • Weighted OPC: • n1 and n2 were # of patients actuallyenrolled • Hypotheses:
Problems: • N = n1 + n2 was fixed in protocol, but n1 and n2 and hencew1 and w2 were not. • So, the weighted OPC is a random variable, e.g., • If w1= w2=50%, then OPCW = 15.5% • IF w1= =70%, w2= 30%, then OPCW = 16% • The setting of hypotheses is inappropriate. • The weighted OPC leads to the study subject to questionable manipulation.
Study Result: • Enrolled patients: Co-morbid Anatomic Total 264 36 300 (88%) (12%) (100%) • Post-hoc determined weighted OPC = 88% * 17%+12% * 14% =16.6%. • Observed event rate: Co-morbid: 8%< Anatomic:16% ( Expected event rate: Co-morbid: 14%> Anatomic:11% ) • Overall observed event rate: 88%*8% +12%*16% = 9% Reject Ho
What if enrolled: Co-morbid: 12% , Anatomic: 88% • Then, post-hoc determined weighted OPC = 12% *17%+ 88% *14% =14.4% • Overall observed event rate: 12% * 8%+ 88% * 16% = 15% (>14.4%) Can’t reject Ho! • The weighted OPC leads to the study subject toquestionable manipulation.
Test statistic: • If treat n1 and n2 as fixed, then the calculated C.I. would be narrower than it should be. • What if pre-specify w1 and w2 in protocol? • Should comply with protocol • If the w1 and w2 are not achieved in the actual enrollment, then a protocol deviation has been committed.
Conclusions • Select comparable control prospectively! • Bio-creep problem should be avoided. • OPC should be determined by sufficient solid scientific evidence. • Variability associated with the point estimate from historical studies should be incorporated in the determination of OPC. • OPC should be appropriately adjusted for different patient populations, and different indications for use. • OPC would need to be updated constantly. • RCT is still the gold standard for clinical studies. • RCT should be preserved for new technology!
References • Rubin, DB, Estimating casual effects from large data sets using propensity scores. Ann Intern Med 1997; 127:757-763 • Rosenbaum, PR, Rubin DB, Reducing bias in observational studies using subclassification on the propensity score. JASA 1984; 79:516-524 • D’agostino, RB, Jr., Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group, Statistics in medicine, 1998,17:2265-2281
References • Blackstone, EH, Comparing apples and oranges, J. Thoracicand Cardiovascular Surgery, January 2002; 1:8-15 • Grunkemeier, GL and et al, Propensity score analysis of stroke after off-pump coronary artery bypass grafting, Ann Thorac Surg 2002; 74:301-305 • Wolfgang, C. and et al, Comparing mortality of elder patients on hemodialysis versus peritoneal dialysis: A propensity score approach, J. Am Soc Nephrol 2002; 13:2353-2362