350 likes | 639 Views
Causal Inference. Sebastian Galiani November 2006. Motivation. The research questions that motivate most studies in the health sciences are causal in nature. For example:
E N D
Causal Inference Sebastian Galiani November 2006
Motivation • The research questions that motivate most studies in the health sciences are causal in nature. For example: • What is the efficacy of a given drug in Impact: given population? What fraction of deaths from a given disease could have been avoided by a given treatment or policy?
Motivation • The most challenging empirical questions in economics also involve causal-effect relationships: • Does school decentralization improve schools quality?
Motivation • Interest in these questions is motivated by: • Policy concerns • Does privatization of water systems improve child health? • Theoretical considerations • Problems facing individual decision makers
Causal Analysis • The aim of standard statistical analysis, typified by likelihood and other estimation techniques, is to infer parameters of a distribution from samples drawn of that distribution. • With the help of such parameters, one can: • Infer association among variables, • Estimate the likelihood of past and future events, • As well as update the likelihood of events in light of new evidence or new measurement.
Causal Analysis • These tasks are managed well by standard statistical analysis as long as experimental conditions remain the same. • Causal analysis goes one step further: • Its aim is to infer aspects of the data generation process. • With the help of such aspects, one can deduce not only the likelihood of events under static conditions, but also the dynamics of events under changing conditions.
Causal Analysis • This capability includes: • Predicting the effects of interventions • Predicting the effects of spontaneous changes • Identifying causes of reported events • This distinction implies that causal and associational concepts do not mix.
Causal Analysis The word cause is not in the vocabulary of standard probability theory. • All Probability theory allows us to say is that two events are mutually correlated, or dependent –meaning that if we find one, we can expect to encounter the other. • Scientists seeking causal explanations for complex phenomena or rationales for policy decisions must therefore supplement the language of probability with a vocabulary for causality.
Causal Analysis • Two languages for causality have been proposed: • Structural equation modeling (ESM) (Haavelmo 1943). • The Neyman-Rubin potential outcome model (RCM) (Neyman, 1923; Rubin, 1974).
The Rubin Causal Model • Define the population by U. Each unit in U is denoted by u. • For each u U, there is associated a value Y(u) of the variable of interest Y, which we call: the response variable. • Let A be a second variable defined on U. We call A an attribute of the units in U.
The key notion is the potential for exposing or not exposing each unit to the action of a cause: • Each unit has to be potentially exposable to any one of the causes. • Thus, Rubin takes the position that causes are only those things that could be treatments in hypothetical experiments. • An attribute cannot be a cause in an experiment, because the notion of potential exposability does not apply to it.
For simplicity, we assume that there are just two causes or level of treatment. • Let D be a variable that indicates the cause to which each unit in U is exposed: In a controlled study, D is constructed by the experimenter. In an uncontrolled study, it is determined by factors beyond the experimenter’s control.
The values of Y are potentially affected by the particular cause, t or c, to which the unit is exposed. • Thus, we need two response variables: Yt(u), Yc(u) • Yt is the value of the response that would be observed if the unit were exposed to t and • Yc is the value that would be observed on the same unit if it were exposed to c.
Let D also be expressed as a binary variable: D = 1 if D = t and D = 0 if D = c • Then, the outcome of each individual can be written as: Y(U) = DY1 + (1 – D)Y0
Definition: For every unit u treatment {Du = 1 instead of Du = 0} causes the effect u = Y1(u) – Y0(u) • This definition of a causal effect assumes that the treatment status of one individual does not affect the potential outcomes of other individuals. • Fundamental Problem of Causal Inference: It is impossible to observe the value of Y1(u) and Y0(u) on the same unit and, therefore, it is impossible to observe the effect of t on u. • Another way to express this problem is to say that we cannot infer the effect of treatment because we do not have the counterfactual evidence i.e. what would have happened in the absence of treatment.
Given that the causal effect for a single unit u cannot be observed, we aim to identify the average causal effect for the entire population or for sub-populations. • The average treatment effect ATE of t (relative to c) over U (or any sub-population) is given by: ATE =E [Y1(u) – Y0(u)] = E [Y1(u)] – E [Y0(u)] (1)
The statistical solution replaces the impossible-to-observe causal effect of t on a specific unit with the possible-to-estimate average causal effect of t over a population of units. • Although E(Y1) and E(Y0) cannot both be calculated, they can be estimated. • Most econometricsmethods attempt to construct from observational data consistent estimates of
Consider the following simple estimator of ATE: • Note that equation (1) is defined for the whole population, whereas equation (2) represents an estimator to be evaluated on a sample drawn from that population
Let equal the proportion of the population that would be assigned to the treatment group. • Decomposing ATE, we have:
If we assume that • Which is consistently estimated by its sample analog estimator:
Thus, a sufficient condition for the standard estimator to consistently estimate the true ATE is that: • In this situation, the average outcome under the treatment and the average outcome under the control do not differ between the treatment and control groups. • In order to satisfy these conditions, it is sufficient that treatment assignment D be uncorrelated with the potential outcome distributions of Y1 and Y2. • The principal way to achieve this uncorrelatedness is through random assignment of treatment.
In most circumstances, there is simply no information available on how those in the control group would have reacted if they had received the treatment instead. • This is the basis for an important insight into the potential biases of the standard estimator (2). • After a bit of algebra, it can be shown that:
This equation specifies the two sources of biases that need to be eliminated from estimates of causal effects from observational studies. • Selection Bias: Baseline difference. • Treatment Heterogeneity. • Most of the methods available only deal with selection bias, simply assuming that the treatment effect is constant in the populationor by redefining the parameter of interest in the population.
Treatment on the Treated • ATE is not always the parameter of interest. • In a variety of policy contexts, it is the average treatment effect for the treated that is of substantive interest: TOT =E [Y1(u) – Y0(u)| D = 1] = E [Y1(u)| D = 1] – E [Y0(u)| D = 1]
Treatment on the Treated • The standard estimator (2) consistently estimates TOT if:
Structural Equation Modeling • Structural equation modeling was originally developed by geneticists (Wright 1921) and economists (Haavelmo 1943).
Structural Equations • Definition: An equation y = β x + ε (8) is said to be structural if it is to be interpreted as follows: • In an ideal experiment where we control X to x and any other set Z of variables (not containing X or Y) to z, the value y of Y is given by β x + ε, where ε is not a function of the settings x and z. • This definition is in the spirit of Haavelmo (1943), who explicitly interpreted each structural equation as a statement about a hypothetical controlled experiment.
Thus, to the often asked question, “Under what conditions can we give causal interpretation to structural coefficients?” • Haavelmo would have answered: Always! • According to the founding father of SEM, the conditions that make the equation y = β x + ε structural are precisely those that make the causal connection between X and Y have no other value but β, and ensuring that nothing about the statistical relationship between x and ε can ever change this interpretation of β.
The average causal effect: The average causal effect on Y of treatment level x is the difference in the conditional expectations: E(Y|X = x) – E(Y|X = 0) • In the context of dichotomous interventions (x = 1), this causal effect is called the averagetreatment effect (ATE).
Representing Interventions • Consider the structural model M: z = fz(w) x = fx(z, ) y = fy(x, u) • We represent an intervention in the model through a mathematical operator denoted d0(x). • d0(x) simulates physical interventions by deleting certain functions from the model, replacing them by a constant X = x, while keeping the rest of the model unchanged.
To emulate an intervention d0(x0) that holds X constant (at X = x0) in model M, replace the equation for x with x = x0, and obtain a new model, Mx0 z = fz(w) x = x0 y = fy(x, u) • The joint distribution associated with the modified model, denoted P(z, y| d0(x0)) describes the post-intervention (“experimental”) distribution. • From this distribution, one is able to assess treatment efficacy by comparing aspects of this distribution at different levels of x0.
Structural Parameters • Definition: The interpretation of a structural equation as a statement about the behavior of Y under a hypothetical intervention yields a simple definition for the structural parameters. The meaning of β in the equation y = β x + ε is simply
Counterfactual Analysis in Structural Models • Consider again model Mxo. Call the solution of Y the potential response of Y to x0. • We denote it as Yx0(u, , w). • This entity can be given a counterfactual interpretation, for it stands for the way an individual with characteristics (u, , w) would respond, had the treatment been x0, rather than the x = fx(z, ) actually received by the individual.
In our example, Yx0(u, , w) = Yx0(u) = y = fy(x0, u) • This interpretation of counterfactuals, cast as solutions to modified systems of equations, provides the conceptual and formal link between structural equation modeling and the Rubin potential-outcome framework. • It ensures us that the end results of the two approaches will be the same. • Thus, the choice of model is strictly a matter of convenience or insight.
References • Judea Pearl (2000): Causality: Models, Reasoning and Inference, CUP. Chapters 1, 5 and 7. • Trygve Haavelmo (1944): “The probability approach in econometrics”, Econometrica 12, pp. iii-vi+1-115. • Arthur Goldberger (1972): “Structural Equations Methods in the Social Sciences”, Econometrica40, pp. 979-1002. • Donald B. Rubin (1974): “Estimating causal effects of treatments in randomized and nonrandomized experiments”, Journal of Educational Psychology66, pp. 688-701. • Paul W. Holland (1986): “Statistics and Causal Inference”, Journal of the American Statistical Association 81, pp. 945-70, with discussion.