Treatment Effect Heterogeneity & Dynamic Treatment Regime Development

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy

Dynamic treatment regimes (DTRs) are individually tailored treatments, with treatment type and dosage changing according to individual outcomes. ***utilize treatment effect heterogeneity to individualize treatment***

Example of a DTR • Adaptive Drug Court Program for drug abusing offenders. • Goal is to minimize recidivism and drug use. • Marlowe et al. (2008, 2009, 2011)

Adaptive Drug Court Program

Treatment Effect Heterogeneity Focus on Theory: Used to deepen understanding of underlying causal, mechanistic structure Focus on Practice: Used to improve decision making in practice For Whom, When, and in Which Context, might a specific treatment be most useful? This is our focus today

Treatment Effect Heterogeneity & DTR Development Take Advantage of Treatment Effect Heterogeneity in Design of Intervention Trial Embedded tailoring variables Part of “treatment action” Take Advantage of Treatment Effect Heterogeneity in Design of the DTR. Data analyses

Pelham ADHD Study Continue, reassess monthly; randomize if deteriorate Yes 8 weeks Begin low-intensity BMOD Augment with other treatment Assess- Adequate response? Randomassignment: No Intensify Current Treatment Randomassignment: Continue, reassess monthly; randomize if deteriorate 8 weeks Intensify Current Treatment Begin low dose Med Assess- Adequate response? Randomassignment: Augment with other Treatment No

Txt Effect Heterogeneity Embedded Tailoring Variable Embedded Tailoring Variables: (a) Teacher reported Impairment Scale, (b) Teacher reported individualized list of target behaviors Non-response is assessed at 8 weeks and every 4 weeks thereafter.

Txt Effect Heterogeneity  Embedded DTRs 4 Embedded DTRs Start with BMOD; only if nonresponse criterion reached, augment with MED Start with BMOD; only if nonresponse criterion reached, intensify BMOD Start with MED; only if nonresponse criterion reached, augment with BMOD Start with MED; only if nonresponse criterion reached, intensify MED

Oslin Alcoholism Trial NTX 8 wks Response Randomassignment: TDM + NTX Early Trigger for Nonresponse CBI+MM Randomassignment: Nonresponse CBI +NTX+MM Randomassignment: NTX 8 wks Response Randomassignment: TDM + NTX Late Trigger for Nonresponse Randomassignment: CBI +MM Nonresponse CBI +NTX+MM

Txt Effect Heterogeneity Embedded Tailoring Variable & Embedded DTR Embedded Tailoring Variable: heavy drinking days (HDD) First randomization is between treatment actions: move to stage 2 if 2 HDDs versus move to stage 2 if 5 HDDs 8 Embedded DTRs

A Data Analysis Method for Utilizing Treatment Effect Heterogeneity to Construct a “More Deeply Tailored” DTR: Q-Learning Subject data from sequential, multiple assignment, randomized trials. At each stage subjects are randomized among alternative options. Aj is a randomized action with known randomization probability. Binary actions with P[Aj=1]=P[Aj=-1]=.5

Dynamic Treatment Regime (DTR) • The DTR is given by a sequence of decision rules, one per stage of treatment (here 2 stages) • DTR= • Goal: Construct • for which the expected outcome is • maximal.

Q-Learning • Q-Learning (Watkins, 1989; Ernst et al., 2005; Murphy, 2005) (a popular method from computer science)—generalizes regression to multiple stages • Q-Learning uses dynamic programming arguments combined with linear regression estimation of conditional means.

Simple Version of Q-Learning – There is a regression for each stage. • Stage 2 regression: Regress Y on to obtain • Stage 1 regression: Regress on to obtain

for subjects entering stage 2: • is the predicted end of stage 2 response when the stage 2 treatment is equal to the “best” treatment. • is the dependent variable in the stage 1 regression for patients moving to stage 2

A Simple Version of Q-Learning – • Stage 2 regression, (using Y as dependent variable) yields • Arg-max over a2 yields

A Simple Version of Q-Learning – • Stage 1 regression, (using as dependent variable) yields • Arg-max over a1 yields

Pelham ADHD Study Continue, reassess monthly; randomize if deteriorate Yes 8 weeks Begin low-intensity BMOD Augment with other treatment Assess- Adequate response? Randomassignment: No Intensify Current Treatment Randomassignment: Continue, reassess monthly; randomize if deteriorate 8 weeks Intensify Current Treatment Begin low dose Med Assess- Adequate response? Randomassignment: Augment with other Treatment No

(X1, A1, R1, X2, A2, Y) Y = end of year school performance R1=1 if early responder; =0 if early non-responder X2includes the month of non-response, M2, and a measure of adherence in stage 1 (S2) S2 =1 if adherent in stage 1; =0, if non-adherent X1 includes baseline school performance, Y0 , whether medicated in prior year (S1), ODD (O1) S1 =1 if medicated in prior year; =0, otherwise. ADHD Example 20

Stage 2 regression for Y: Stage 1 outcome: ADHD Example 21

IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive and was non-adherent, THEN augment present treatment; ELSE IF the child is nonresponsive and was adherent, THEN select intensification of current treatment. Dynamic Treatment Regime Proposal 22

Future Challenges • High dimensional data; investigators want to collect real time data • Feature construction & Feature selection • Many stages or infinite horizon • This seminar can be found at: • http://www.stat.lsa.umich.edu/~samurphy/ • seminars/JSM_Txt_Heterogeneity2012.ppt

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development