350 likes | 431 Views
Two-stage individual participant data meta-analysis and flexible forest plots David Fisher MRC Clinical Trials Unit Hub for Trials Methodology Research at UCL df@ctu.mrc.ac.uk. 2013 UK Stata Users Group Meeting Cass Business School, London. Outline of presentation.
E N D
Two-stage individual participant data meta-analysis and flexible forest plots David Fisher MRC Clinical Trials Unit Hub for Trials Methodology Research at UCL df@ctu.mrc.ac.uk 2013 UK Stata Users Group Meeting Cass Business School, London
Outline of presentation • Introduction to individual patient data (IPD) meta-analysis (MA) • IPD vs aggregate-data (AD) MA • “One-stage” vs “two-stage” IPD MA • The ipdmetan command • Basic use; comparison with metan • Covariate interactions • Combining AD with IPD • Advanced syntax • The forestplot command • Interface with ipdmetan • Stand-alone use and “stacking” • Summary and Conclusion
Introduction to IPD meta-analysis • Meta-analysis (MA): • Use statistical methods to combine results of “similar” trials to give a single estimate of effect • Increase power & precision • Assess whether treatment effects are similar in across trials (heterogeneity) • Aggregate data (AD) vs IPD: • “Traditional” MAs gather results from publications • Aggregated across all patients in the trial; nothing is known of individual patients • IPD MAs gather raw data from trial investigators • Ensures all relevant patients are included • Ensures similar analysis across all trials • Allows more complex analysis, e.g. patient-level interactions
“One-stage” IPD MA • Consider a linear regression (extension to GLMs or time-to-event regressions is straightforward) • For a one-stage IPD MA (i = trial, j = patient): where αi= trial identifiers β= overall treatment effect estimated across all trials i (with optional random effect ui) • Examples in Stata: • Fixed effects: regress y x i.trial • Random effects: xtmixed y x i.trial|| trial: x, nocons
“Two-stage” IPD MA • For a two-stage IPD MA: for trial 1 … … for trial i • Then: and where • Weights wi may be altered to give random effects • e.g. DerSimonian & Laird, • Straightforward, but currently messy in Stata
Treatment-covariate interactions • Assessment of patient-level covariate interactions is a great advantage of IPD • Arguably best done with “one-stage” • Main effects & interactions (& correlations) estimated simultaneously • But basic analysis also possible with “two-stage” • Relative effect (interaction coefficient) only • Same approach (inverse-variance) as for main effects • Ensures no estimation bias from between-trial effects • Can be presented in a forest plot, with assessment of heterogeneity etc. • Discussed in a published paper (Fisher 2011)
Example data • IPD MA of randomised trials of post-operative radiotherapy (PORT) in non-small cell lung cancer • Trial ID (k=11) • Patient ID (n=2343) • Treatment arm • Outcome is censored time to overall survival (death from any cause) • Time to event (from randomisation) • Event type (death or censorship) • Certain covariate measurements also available, not necessarily for all trials or patients • Disease stage (factor, but treat as continuous) • (+ others)
ipdmetan syntax ipdmetan, study(trialid) eform : stcox arm, strata(sex) Uses “prefix” command syntax: ipdmetan [exp_list], study(study_ID) [ ipd_options ad(aggregate_data_options) forestplot(forest_plot_options)] : estimation_command... default is to pool coeffs from first dep. var. (excluding baseline factor levels) Example: ipdmetan options after comma, before colon estimation_commandand options after colon
Trials included: 11 Patients included: 2342 Meta-analysis pooling of main (treatment) effect estimate arm using Fixed-effects -------------------------------------------------------------------- trial reference | number | Effect [95% Conf. Interval] % Weight ----------------------+--------------------------------------------- belgium| 1.456 1.072 1.979 11.09 EORTC 08861 | 1.643 0.913 2.956 3.02 LILLE | 1.568 1.060 2.319 6.81 ... ... ... ... ... ----------------------+--------------------------------------------- Overall effect | 1.178 1.064 1.305 100.00 -------------------------------------------------------------------- Test of overall effect = 1: z = 3.153 p = 0.002 Heterogeneity Measures --------------------------------------------------- | value df p-value ---------------+----------------------------------- Cochrane Q | 15.88 10 0.103 I² (%) | 37.0% Modified H² | 0.588 tau² | 0.0180 --------------------------------------------------- I² = between-study variance (tau²) as a percentage of total variance Modified H² = ratio of tau² to typical within-study variance Variable label Output style similar to metan or metaan
Forest plot of covariate interactions ipdmetan, study(trialid) eforminteractionkeepall : stcox arm##c.stage Trials included: 8 Patients included: 1962 Meta-analysis pooling of interaction effect estimate 1.arm#c.stage2 using Fixed-effects default is to pool coeffs from first interaction term
Inclusion of aggregate data • I don’t have a separate aggregate dataset, so I will create one artificially from my IPD dataset . ** Generate artificial trial subgrouping . gen subgroup = inlist(trialid, 1, 8, 12, 15) . label define subgroup_ 0 "Trial group 1" 1 "Trial group 2" . label values subgroup subgroup_ . ** Run ipdmetan within one of the subgroups; save the dataset . qui ipdmetan, study(trialid) by(subgroup) nooverallnograph saving(subgroup1.dta) : stcoxarm if subgroup==1, strata(sex)
Inclusion of aggregate data: Syntax . ipdmetan, study(trialid) eformnooverall ad(subgroup1.dta, byad) : stcox arm if subgroup==0, strata(sex) Do not pool IPD and aggregate together Aggregate data syntax “byad” = treat IPD & aggregate data as subgroups estimation_command
Inclusion of aggregate data: Screen output Trials included from IPD: 7 Patients included: 1333 Trials included from aggregate data: 4 Patients included: 1009 Pooling of main (treatment) effect estimate arm using Fixed-effects ------------------------------------------------------------------- trial reference | number | Effect [95% Conf. Interval] % Weight ---------------------+--------------------------------------------- IPD | LCSG 773 | 1.123 0.827 1.526 11.13 CAMS | 1.029 0.768 1.378 12.20 ... | ... Subgroup effect | 1.021 0.896 1.163 61.25 ---------------------+--------------------------------------------- Aggregate | belgium | 1.456 1.072 1.979 11.09 EORTC 08861 | 1.643 0.913 2.956 3.02 ... | ... Subgroup effect | 1.479 1.256 1.743 38.75 ------------------------------------------------------------------- Tests of effect size = 1: IPD z = 0.305 p = 0.760 Aggregate z = 4.682 p = 0.000
Advanced syntax example:non “e-class” estimation command ipdmetan(u[1,1]/V[1,1]) (1/sqrt(V[1,1])) , study(trialid) eform ad(subgroup1.dta, byad) lcols(evrate=_d %3.2f "Event rate") rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)") forest(nooverallnostatsnowt) : sts test arm if subgroup==0, mat(u V) Effect estimate & SE not from e(b) – must specify manually
Advanced syntax example:columns of data in forestplot ipdmetan(u[1,1]/V[1,1]) (1/sqrt(V[1,1])) , study(trialid) eform ad(subgroup1.dta, byad) lcols(evrate=_d %3.2f "Event rate") rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)") forest(nooverallnostatsnowt) : sts test arm if subgroup==0, mat(u V) Mean of var currently in memory (note user-assigned name, to match with varname in aggregate dataset) Collect lists of returned stats
Advanced syntax example: Forest plot These vars do not appear in the aggregate dataset, so are not plotted Subtotal cannot be calculated for aggregate data
The forestplotcommand • Does not perform any calculations/estimations; simply plots existing data as a forest plot • Overall/subgroup estimates, spacings, labels, text columns etc. need to be created/arranged in advance • Ordering & spacing; marking of subgroup/overall estimates for plotting “diamonds”: _use • Principal left-hand data column (study IDs, heterogeneity etc. – string fmt): _labels • This setup is done automatically by ipdmetanbefore passing to forestplot • (but can also be done manually by user) • Multiple datasets can be passed to forestplot at once to create a single large “stacked” plot on common x-axis
forestplot syntax forestplot [varlist] [if] [in] [, plot_optionsgraph_optionsusing_option] • varlist = manually specify varnames to plot • plot_optionscontrol the data plotting (within plot region) • graph_optionscontrol the surroundings (outside plot region; graph region) • using_option represents one or more options that allow suitable datasets (or parts of datasets) to be fed to forestplot, possibly with different plot_options, to form a single large forest plot on a single x-axis.
using_optionsyntax using(filenamelist[if] [in] [, plot_options]) [using(filenamelist [if] [in] [, plot_options)] ...] • filenamelistis a list of one or more Stata-format datasets • parts may be specified with [if] [in] • same filename can appear more than once • order of filenames determines placement in graph • Different plot_optionsmay be specified to each using option • For same options applied to multiple files, place them in a filenamelist • For different options applied to each file, place each file in a different using option
plot_options syntax • Based on metan syntax, options refer to different parts of the forest plot • Most options appropriate to the underlying twoway plot type are acceptable, with some exceptions
Example forestplot dataset(“resultsset” from last ipdmetan example) Estimates; CIs; weights Extra data columns
“Stacking” of forest plots • Imagine: • dataset on previous slide is saved as ipdtest.dta • we want IPD boxes to be red, and AD boxes to be green • We proceed as follows: • Run forestplot with two using(...) options, one for each part of the plot, with the same filename • (Alternatively: run ipdmetan twice and save under different filenames) • Specify our desired plot_options as suboptions to using()
forestplot, using(ipdtest.dtaif _by==1, boxopt(mcolor(red))) using(ipdtest.dta if _by==2, boxopt(mcolor(green))) lcols(evrate) rcols(u_1_1_ V_1_1_) nooverallnostatsnowt
Summary and conclusion • IPD is increasingly used, and its advantages widely accepted • Large numbers of MA scientists use two-stage models for analysing IPD • Currently only AD MA (e.g. metan) and one-stage IPD (e.g. xtmixed) commands exist in Stata • ipdmetan is a universal command for two-stage IPD MA • forestplotis a flexible forest plot command • does not carry out analysis itself, thus not restricted by it • may be useful outside the MA context (e.g. presenting trial subgroups)
Further information • Other related programs (all call forestplot by default): • admetan: calls ipdmetan to analyse AD (direct alternative to metan) • ipdover: fit model within series of subgroups • petometan: perform meta-analysis of time-to-event data using the Peto (log-rank) method • SSC and Stata Journal article in near future
Thankyou! • Questions, requests, bug reports: df@ctu.mrc.ac.uk • Thanks to: • Jayne Tierney, Patrick Royston • Ross Harris (author of metan) for advice & support • Assorted colleagues for testing • Reference: • Fisher D. J. et al. 2011. Journal of Clinical Epidemiology 64: 949-67