SMART Designs for Developing Dynamic Treatment Regimes

SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy MD Anderson December 2006

Collaborators • A. John Rush (University of Southwestern Texas Medical Center) • Bibhas Chakraborty, Lacey Gunter, Alena Scott (U. Michigan) • Linda Collins (PennState) • Dave Oslin, Kevin Lynch, Tom TenHave (UPenn)

Outline • Why dynamic treatment regimes? • Why SMART experimental designs? • Experimental principles • Constructing and addressing questions regarding an optimal dynamic treatment regime • Why and when non-regular? • A class of solutions • A preliminary STAR*D analysis

Dynamic treatment regimes are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize clinical practice. • Brooner et al. (2002) Treatment of Opioid Addiction • Breslin et al. (1999) Treatment of Alcohol Addiction • Prokaska et al. (2001) Treatment of Tobacco Addiction • Rush et al. (2003) Treatment of Depression

Why Dynamic Treatment Regimes? • High heterogeneity in response to any one treatment • What works for one person may not work for another • Improvement often marred by relapse • What works now for a person may not work later • Side effects and/or co-occurring disorders and/or adherence problems occur frequently

k Decisions on one individual Observation available at jth decision Action at jth decision History available at jth decision

k Decisions History available at jth decision “Reward” following jth decision point (rj is a known function) Primary Outcome:

Goal: Construct decision rules that input information in the history at each decision point and output a recommended decision; these decision rules should lead to a maximal mean Y. The dynamic treatment regime is the sequence of decision rules:

In the future we offer treatment An example of a simple decision rule is: alter treatment at time j if otherwise maintain on current treatment.

SMART experimental designs are sequential, multiple assignment, randomized trial designs. At each step/critical decision, subjects are randomized among alternative options. • CATIE (2001) Treatment of Psychosis in Schizophrenia • STAR*D (2003) Treatment of Depression • Tummarello (1997) Treatment of Small Cell Lung Cancer (many, for many years, in this field) • Oslin (on-going) Treatment of Alcohol Dependence • Pellman (on-going) Treatment of ADHD

Why SMART experimental designs? • Why not use data from multiple randomized trials to construct the dynamic treatment regime? • Use statistical methods that incorporate the potential for delayed effects and are suited for combining data from multiple randomized trials. • Methods from Medical Decision Making involving a variation of a Markovian assumption • Use (an approximation to) dynamic programming.

Why statistical methods for combining over multiple trials are not always the answer Subjects who will enroll in, who remain in or who are adherent in the trial of the one-stage treatments may be quite different from the subjects in SMART.

Designing Principles for a SMART • KEEP IT SIMPLE: At each stage, restrict class of treatments only by ethical, feasibility or strong scientific considerations. Use a summary (responder status) instead of all intermediate outcomes (time until nonresponse, adherence, burden, stress level, etc.) to restrict class of next treatments. • Collect intermediate outcomes that might be useful in ascertaining for whom each treatment works best; information that might enter into the dynamic treatment regime.

Designing Principles • Primary hypotheses concern “main effects” that are both scientifically important and aid in developing the dynamic treatment regime. • Secondary hypotheses consider choice of variables that can be used to tailor treatment and/or compare treatments in an “optimal dynamic treatment regime.”

Primary Hypotheses • EXAMPLE 1: (sample size is highly constrained): Hypothesize that given the secondary treatments provided, the initial treatment Med A + CBT leads to lower drinking than the initial treatment Med A alone. • EXAMPLE 2: (sample size is less constrained): Hypothesize that nonresponders will make greater improvement on EM+Med B+CBT as compared to the improvement on Med B alone.

Secondary Hypotheses • EXAMPLE 1: Hypothesize that non-adhering non-responders will have lower drinking if provided a change in medication + CBT + EM as compared to a change in medication only. • EXAMPLE 2: Hypothesize that the optimal sequence of treatments begins with Med A + CBT as opposed to Med A alone.

Constructing and Addressing Questions Regarding an Optimal Dynamic Treatment Regime

Four Categories of Methods • Likelihood-based (Thall et al. 2000, 2002; POMDP’s in medical decision making and in reinforcement learning; vast literature) • Q-Learning (Watkins, 1989) (a popular method from reinforcement learning) • ---regression • A-Learning (Murphy, 2003; Robins, 2004) ---regression on a mean zero space • Weighting (Murphy, et al., 2002, related to policy search in reinforcement learning) ---weighted mean

Q-learning (k=2)

A Simple Version of Q-Learning –binary actions Approximate • Stage 2 regression: Use least squares with outcome, Y, and covariates to obtain • Set • Stage 1 regression: Use least squares with outcome, and covariates to obtain

Decision Rules:

Why non-regular?

When do we have non-regularity?

Non-regularity

A class of “solutions”

Test if coefficient of A1 is nonzero

Summary: This is an open problem • Use a tuning parameter set around .25? • Just use tests for main effects (averaging over future treatments?) • Just use tests with maximum (maximizing over future treatments?) • Find a way to combine the main effect test with the use of the maximum?

Regression • “S30” = H3, Sw1, Sw1*Aug2, (1-Sw1)*Aug2, Aug2*QIDS • “S31A3” = Sw1*Aug2*Li, (1-Sw1)*Aug2*Li, (1-Aug2)*MIRT, Aug2*Li*QIDS • “S20” = H2, Sw1, (1-Sw1)*Anx • “S21A2” = Sw1*SER, Sw1*VEN, (1-Sw1)*(CIT+BUP), (1-Sw1 )*Anx*(CIT+BUP) • (all covariates are binary except continuous QIDS and covariates in H2, H3)

Results are omitted from this web copy!

Outcome and Residual Plots

Discussion • It is unclear how one might combine averaging over the future actions with maximizing over the future actions. • Ideally the effect a covariate has on the maximized mean outcome should be used to decide whether to use the covariate in the decision rules. We did not do this here. • Constructing “evidence-based” regimes is of great interest in clinical research and there is much to be done by statisticians.

This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/MDAnderson12.06.ppt Email me with questions or if you would like a copy! samurphy@umich.edu

SMART Designs for Developing Dynamic Treatment Regimes