G-Computation Formula for Point Treatment Data

G-Computation Formula for Point Treatment Data

Overview • G-computation formula is one approach to estimating counterfactual distributions (and expectations) • Given a causal DAG, G-computation allows you to estimate various counterfactuals of interest • Here we consider the application of the G-computation to simple point treatment data, later we will discuss it in more generality

Hypothetical Observational Point Treatment Study • A denotes a “treatment” or exposure of interest – assume categorical. • W is a vector (set) of confounders • Y is an outcome • Define X = (Y,W) • Ya are the counterfactual outcomes of interest • The “full data” is:

Hypothetical Causal Graph For a Point Treatment Study W1 confounders W2 W3 treatment A Y outcome

Key Assumptions for G-computational • Consistency Assumption: observed data, O is O=(A,XA) – i.e., the data for a subject is simply one of the counterfactual outcomes from the full data. • Randomization Assumption: • so no unmeasured confounders for treatment. In other words: within strata of W, A is randomized

When randomization doesn’t hold confounders W1 W2 U unmeasured confounder(s) W3 treatment A Y outcome

Key Assumptions, cont. • Experimental Treatment Assignment: all treatments are possible for all members of the target population, or: • for all W.

Likelihood of Data in simple Point Treatment • Given the assumptions, the likelihood of the data simplifies to: • Factorizes into the distribution of interest and the treatment assignment distribution. • The G-computation formula works specifically with parameters of P(Y|A,W) estimated with maximum likelihood and ignores the treatment assignment distribution.

Equality that makes it all happen • Given assumptions, note that P(Y|A=a,W) = P(Ya|W) or E(Y|A=a,W) = E(Ya|W). • Then, . • Which leads to our G-comp. estimate of the counterfactual mean in this simple context.

G-Computation Formula for point treatment studies for discrete W, this is

Example: Linear Model • Using the previous graph, we have the data on a subject is Y,A,W=(W1,W2, W3). • Model E[Y|A,W] as: • Then the estimate of E(Ya) is

Reporting Causal Effect • Will assume two values for a = (1,0) • One possible causal effect parameter is: • In the linear model case, the estimate is then:

Example for Discrete (binary) Y • This is the (possibly) logistic regression equivalent to the linear model example. • General G-comp formula is: • Possible estimate is

Causal Effect for Discrete (binary) Y • In this case, one could report the causal relative risk: • Estimate is:

Parameters of Interest derived from G-Computation • Note, for dichotomous Y =(0,1), then P(Y=1)=E[Y]. • E[Y1]- E[Y0] is the causal risk difference due to treatment or exposure • E[Y1]/ E[Y0] is the causal relative risk • E[Y1](1- E[Y0]) / E[Y0](1- E[Y1]) is the causal odds ratio due to treatment or exposure

G-Computation Formula for Point Treatment Data