350 likes | 529 Views
Pitfalls in Analysis of Survey Data. Ec798, Lecture 4 Dilip Mookherjee Fall 2010. 1. Introduction. Purpose is to alert you to the key practical issues in analysis of survey data
E N D
Pitfalls in Analysis of Survey Data Ec798, Lecture 4 DilipMookherjee Fall 2010
1. Introduction • Purpose is to alert you to the key practical issues in analysis of survey data • By `analysis’, I mean making inferences concerning effectiveness of particular policies or programs, or behavioral patterns • Effectiveness assessment requires comparison of observed outcomes with a counterfactual: what would have happened in the absence of the program
Intro, contd. • Assessing counterfactuals requires appropriate benchmarks of comparison, and/or a theory which predicts how people and institutions would have behaved if the program had not been instituted, and how this would have changed the observed outcomes • Requires considerable creativity, ingenuity in addition to understanding local context and institutions
Intro, contd. • Most people are prone to drawing inferences based on cross-sectional (comparing areas with and without the program) or time-series evidence (comparing outcomes before and after the program), without being careful about assessing counterfactuals • Important in how you react and learn from almost any data pertaining to effects of a given development program --- i.e., how you evaluate work done by others to make claims about effectiveness based on their analysis
Intro, contd. • Courses on statistics or econometrics will emphasize the assumptions needed to make valid inferences from data, how to assess validity of these assumptions, and what to do if there is substantial doubt about their validity • In this session I will try to give you a practitioner’s perspective on this, based on my own experience • Will eschew technicalities, and provide an intuitive common-sense account
Pitfalls and Qualifications to Statistical Inference • I will try to give you a laundry list of the most common pitfalls and qualifications to what can be learned from analysis of statistical data concerning program effectiveness • And the most common techniques available for overcoming these • Even if you are not going to do this kind of analysis, its important for you to understand what others are doing, to review it critically, and raise appropriate questions
Laundry List • Pitfalls, concerning bias (of estimates): • Selection Bias and Endogeneity (reverse causality, omitted variables) • Measurement Error • Functional Form (non-linearity, censoring, truncations) • Qualifications, mainly concerning precision (calculating standard errors correctly): • Heteroscedasticity • Clustering • Serial Correlation
Selection Bias • Lets say you compare outcomes of a program between areas which had and didn’t have it: e.g., decentralization of forest management to local user groups: how does forest degradation vary between areas with and without such decentralization? Or compare health of children in villages that received a sanitation program with villages lacking such a program
Selection Problem, contd. • Problem is that areas with more degraded forests may have been more likely to have forest user groups in the first place; sanitation program likely to have been targeted to villages with dirtier water and greater poverty • If so, your cross-sectional differences will underestimate the true effect of the program • However you cannot be sure of the direction of the bias: maybe the communities more concerned about deforestation or health may have lobbied harder to get these programs
Selection Problem, contd. • Maybe you can get around this by looking at effects of the program before and after its implementation in the areas in which it was implemented • Lets say you have a panel data-set and see an improvement after • But what if the areas which didn’t receive the program also witnessed an improvement? Maybe there was something else that was going on that explains the improvement in both sorts of areas?
Selection Problem, contd. • Then maybe you can compare the changes before and after in the treatment and control areas? (The diff-of-diff estimate) • Can we stop here? Can we trust/test the diff-of-diff estimate? What assumptions are needed? And so on… • Are there contexts where cross-sectional comparisons yield valid (unbiased) estimates? When might they be better than the panel data based diff-of-diff estimate?
Pitfall No. 1: Endogeneity • Selection problems part of a wider concern about endogeneity of program placement • One form of endogeneity: reverse causality (is forest degradation affecting creation of forest user groups? Is health driving placement of the sanitation program or the other way around? • The other form is omitted variable bias: maybe some third, unobserved variable such as underlying social capital of the community driving both deforestation and user group formation?
Other Examples of Endogeneity Problems • Suppose you are interested in effectiveness of a price subsidy program for rice on rice consumption: does consumption cause price or the other way around? Does underlying tastes for rice affect both price and consumption? • Are small farms more productive than large farms? Or are more productive farms tend to be smaller (owing to greater subdivision)? Is unobserved soil quality driving both size and productivity?
Endogeneity Examples, contd. • What is the effectiveness of a fertilizer distribution program on farm productivity? Does fertilizer application drive productivity? Or is it the case that more hardworking, motivated farmers tend to respond to the program more actively and apply more fertilizer? • Does under-nutrition cause low productivity/earnings, or the other way around?
Pitfall No. 2: Measurement Error • Is the independent variable measured accurately? • Problems measuring income, consumption based on survey responses (recall, aggregation, purposive..) • May not have data concerning program implementation at a disaggregated enough scale (e.g., interested in village-level effects but only have program intensity at province level)
Measurement Error, contd. • `Iron Law of Econometrics’: measurement error in independent variable (only) causes under-estimate of program effect (attenuation bias) • Intuitively this is because estimate of the effect is based on how independent and dependent variable co-vary, relative to the variation of the independent variable
Example of Attenuation Bias • Suppose you over-estimated placement of an effective fertilizer distribution program: some villages that didn’t get the program are mistakenly believed to have got it • Then you would be assessing program effectiveness by comparing mean farm yields in villages that are thought to have got the program, with those that appear not to have • You would under-estimate the effectiveness as some low-yield villages are mistakenly believed to have got the program
But Note That: • Measurement error in dependent variable does not matter (for bias): if productivity is measured with error this pertains to both kinds of villages equally • Not all kinds of independent variable errors matter (eg when data is at a higher level of aggregation: the measurement error is orthogonal to the measured value of the independent variable so it washes out in the aggregate) • Measurement error cannot reverse sign of the effect, or raise its quantitative magnitude (unlike endogeneity problems)
How Can You Tell How Serious Endogeneity or Measurement Error is? • Have to rely on your understanding of the situation, and your prior expectations based on theory • There is no easy test or measure • What you can do is to analyze the data differently so as to correct for the problems, and see how much of a difference this makes
How to Correct for Endogeneity • Approach 1: Control for possible omitted variables: collect data on those and include them in the regression • What about unobserved omitted variables: here panel data can come in very useful: use of fixed effects to control for unobserved heterogeneity • E.g., in the analysis of user groups and deforestation, unobserved `social capital’ which potentially affects both formation of user groups and deforestation is effectively controlled for, by looking at effects of formation of user groups on changes in forest quality • No longer comparing levels across areas, but changes over time --- the diff-of-diff estimate
Other Examples of Diff-of-Diff • Productivity variation by farm size or fertilizer application: control for soil fertility as best as you can, control for farmer ability/motivation with farmer fixed effects, for unobserved plot quality with plot fixed effects • Need data for same farmer over time as he changes scale of cultivation (for farmer fixed effects), for productivity of separate plots (for plot fixed effects) with differential fertilizer application
Assumptions Underlying Diff-of-Diff • Have to still assume that program placement or its timing was exogenous (ie uncorrelated with independent variable) • At the level of changes over time, placement was not purposive (e.g., can you rule out the possibility that the creation of user groups was one of many other changes taking place, one of which was really effective) • Test by looking at pre-program trends, other policies etc.
Other Assumptions underlying D-o-D • Effects of unobserved omitted variables are linear and additive so they can be washed out by looking at changes over time • No significant increase in measurement error when looking at changes over time (if panel responses based on recall, lot of the reported changes may just be the result of recall errors) • If this is the case, the cross-sectional estimate may involve less bias • Often significant cross-country regression results disappear in panel data: don’t know whether to interpret this as evidence of significant OV bias in the cross-section, or significant attenuation bias in the panel
Instrumental Variables • Another qualification: DoD deals with unobserved heterogeneity, but not reverse causality (nutrition-earnings example) • IV estimator: the most commonly used method to deal with endogeneity problems and measurement error • Idea is to find an instrument for the independent variable: a source of variation in the independent variable which logically cannot have a direct impact on the dependent variable
Examples of IV/Natural Experiments • UK water quality-mortality study (1853 London cholera epidemic): which of two companies was supplying water to any given street • Cuban boatlift effect on labor supply in Miami • Middle East events that affect international oil prices, which shifts the price of rice owing to higher transport costs , but not the consumption demand for rice • Regression discontinuity: class-size effects on learning; minimum wage laws across state borders
IV Assumptions • Two key assumptions for an instrument to be valid: • It has to predict significant variation in the independent variable in question (water quality/labor supply/rice price/class-size): first stage F • Exclusion restriction: conditional on the effect on the independent variable (and other controls) there is no direct effect on the dependent variable No statistical tests for the exclusion restriction; based on theory and institutional knowledge
Pitfall No. 3: Functional Form • Regression estimates of effects of continuous treatment effects based on hypothesis of linear relationship between independent and dependent variable (eg effect of a drug does not depend on dosage) • In many cases, may expect this to be wrong (water on productivity, age on earnings, community heterogeneity on collective action, gender empowerment on ROSCA participation) • In other cases, may not know what pattern to expect • Additional problem: program effect may be heterogeneous (very serious practical problem)
Testing Linearity • Include higher order terms, take log transformations, interaction effects etc. • Non-parametric analysis • Both have practical problems which can be resolved only with sufficient data • Can do only with respect to one variable at a time
Censoring and Truncation Bias • Particular form of problem with functional form: limited dependent variable • Sometimes it is zero or one (eg member of a group or not, road built or not) • Sometimes it is endogenously truncated: you cannot work negative number of hours, or collect negative quantity of firewood Ignoring the inherent nonlinearity of the data can give rise to significantly biased estimates
Censoring and Truncation Biases • What can you do? • Assume functional form of distribution of errors: e.g., probit or logit regressions for 0-1 variables, tobits for truncated variables • Results could be sensitive to what you assume here • Some new methods that don’t depend so much on error distributions (semi-parametric methods, such as LAD) • Warning: cannot easily extend to panel estimators such as diff-of-diff!
Qualifications Laundry List • Problems emphasized in many econometrics texts concern correct assessment of precision of estimates (how to calculate standard errors): • Heteroscedasticity • Serial correlation • Clustering (more important, less often discussed in textbooks) Ignoring these may cause you to overlook more precise estimates, and more importantly overestimate your precision/statistical significance (thus biasing inferences)
Heteroscedsaticity • Where precision varies with `size’ of the independent variable, OLS is not the most precise estimator (data needs to be re-weighted), and the standard errors are incorrectly calculated • STATA can make these corrections for you (`robust’ or White-corrected standard errors) • Case for quantile regressions
Serial Correlation • Problem when you have repeated observations for the same agent or unit over time: if they are not independent, treating them as such means you overestimate the precision of your estimates • Problem with macro time series data, also with panel data • Can test for severity (eg Durbin-Watson stat.) and correct standard error estimates
Clustering • Most serious problem is when the data is clustered (by village, industry, location etc.) and different observations in each cluster are not independent: again results in overestimate of precision (underestimation of s.e.’s) • STATA cluster command can correct your s.e. estimate (you have to specify the `level’ of clustering) • Can often blow them sky high, whence statistical significance of all your results can disappear
Concluding Comments • Many filters and pinches of salt involved here, but these are absolutely fundamental to separate garbage from real evidence • Pitfalls (concerning bias) and Qualifications (precision), but both can result in biased inference • Lot of techniques for detecting and correcting problems • Cannot rely on `technical’ fixes alone: no substitute for good and sufficient data, common-sense, intuition, theory and institutional knowledge • Ultimately to be useful and compelling, the analysis must be simple and clear