150 likes | 158 Views
Learn about panel data analysis, which combines cross-sectional and time-series data to observe the same individuals over time. Discover how panel data can help address omitted variable bias and how differencing techniques can be applied to estimate fixed effects. Explore a two-period example and understand the impact on coefficient estimates.
E N D
Panel Data Analysis Introduction
And now for… Panel Data! • Panel data has both a time series and cross-section component • Observe same (eg) people over time • You’ve already used it! • Difference-in-differences is a panel (or pooled cross-section) data technique • Panel data can be used to address some kinds of omitted variable bias • E.g., use “yourself in a later period” as comparison group for yourself today • If the omitted variables is fixed over time, this “fixed effect” approach removes bias
Unobserved Fixed Effects • Initially consider having two periods of data (t=1, t=2), and suppose the population model is: yit = 0 + 0d2t + 1xit1 +…+ kxitk + ai + uit • ai = “person effect” (etc) has no “t” subscript • uit = “idiosyncratic error” ai = time-constant component of the composite error, third subscript: variable # Dummy for t= 2 (intercept shift) Person i… …in period t
Unobserved Fixed Effects it • The population model is yit = 0 + 0d2t + 1xit1 +…+ kxitk + ai + uit • If aiis correlated with the x’s, OLS will be biased, since aiis part of the composite error term • Aside: this also suffers from autocorrelation • Cov(i1, i2) = cov(ai,ai) + 2cov(uit,ai) + cov(ui2,ui1) = var(ai) • So OLS standard errors biased (downward) – more later. • But supposing the uit are not correlated with the x’s – just the fixed part of the error is -- we can “difference out” the unobserved fixed effect…
First differences Period 2: yi2 = 0 + 0∙1 +1xi21 +…+ kxi2k + ai + ui2 Period 1: yi1 = 0 + 0∙0 +1xi11 +…+ kxi1k + ai + ui1 Diff: yi = 0 +1xi1 +…+ kxik + ui • yi,xi1,…,xik : “differenced data” – changes in y, x1, x2,…,xk from period 1 to period 2 • Need to be careful about organization of the data to be sure compute correct change • Model has no correlation between the x’s and the new error term (*just by assumption*), so no bias • (Also, autocorrelation taken out)
Differencing w/ Multiple Periods • Can extend this method to more periods • Simply difference all adjacent periods • So if 3 periods, then subtract period 1 from period 2, period 2 from period 3 and have 2 observations per individual; etc. • Also: include dummies for each period, so called “period dummies” or “period effects” • Assuming the uit are uncorrelated over time (and with x’s) can estimate by OLS • Otherwise, autocorrelation (and ov bias) remain
Two-period example from textbook • Does higher unemployment rate raise crime? • Data from: • 46 U.S. cities (cross-sectional unit “i”) • in 1982, 1987 (the two years, “t”) • Regress crmrte (crimes per 1000 population) on unem (unemployment rate) and a dummy for 1987 • First, let’s see the data…
Pooled cross-section regression . reg crmrte unem d87, robust Linear regression Number of obs = 92 F( 2, 89) = 0.63 Prob > F = 0.5336 R-squared = 0.0122 Root MSE = 29.992 ------------------------------------------------------------------------------ | Robust crmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- unem | .4265473 .9935541 0.43 0.669 -1.547623 2.400718 d87 | 7.940416 7.106315 1.12 0.267 -6.17968 22.06051 _cons | 93.42025 10.45796 8.93 0.000 72.64051 114.2 ------------------------------------------------------------------------------ 92 observations Nothing significant, magnitude of coefficients small
First difference regressionc = “change” = . reg ccrmrte cunem, robust Linear regression Number of obs = 46 F( 1, 44) = 7.40 Prob > F = 0.0093 R-squared = 0.1267 Root MSE = 20.051 ------------------------------------------------------------------------------ | Robust ccrmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cunem | 2.217999 .8155056 2.72 0.009 .5744559 3.861543 _cons | 15.4022 5.178907 2.97 0.005 4.964803 25.8396 ------------------------------------------------------------------------------ Now only 46 observations (why?) Both intercept shift (-- now the constant) and unemployment rate are significant Also: magnitudes larger
Why did coefficient estimates get larger and more significant? • Perhaps cross-section regression suffered from omitted variables bias [cov(xit,ai) ≠ 0] • Third factors, fixed across the two periods, which raise unemployment rate and lower crime rate • (??) More generous unemployment benefits? … • To be clear: taking differences can make omitted variables bias worse in some cases • To oversimplify, depends which is larger: • cov(xit, uit) or cov(xit,ai) • Possible example: crime and police
More police cause more crime?!(lpolpc = log police per capita) . reg crmrte lpolpc d87, robust Linear regression Number of obs = 92 F( 2, 89) = 9.72 Prob > F = 0.0002 R-squared = 0.1536 Root MSE = 27.762 ------------------------------------------------------------------------------ | Robust crmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lpolpc | 41.09728 9.527411 4.31 0.000 22.16652 60.02805 d87 | 5.066153 5.78541 0.88 0.384 -6.429332 16.56164 _cons | 66.44041 7.324693 9.07 0.000 51.8864 80.99442 ------------------------------------------------------------------------------ • A 100% increase in police officers per capita associated with 41 more crimes per 1,000 population • Seems unlikely to be causal! (What’s going on?!)
In first differences . reg ccrmrte clpolpc, robust Linear regression Number of obs = 46 F( 1, 44) = 4.13 Prob > F = 0.0483 R-squared = 0.1240 Root MSE = 20.082 ------------------------------------------------------------------------------ | Robust ccrmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- clpolpc | 85.44922 42.05987 2.03 0.048 .6831235 170.2153 _cons | 3.88163 2.830571 1.37 0.177 -1.823011 9.586271 ------------------------------------------------------------------------------ • 100% increase in police officer per capita now associated with 85 more crimes per 1,000 population!! • Could it be that omitted variables bias is worse in changes in this case? • On the other hand, confidence interval is wide
Bottom line • Estimating in “differences” is not a panacea • Though we usually trust this variation more than cross-sectional variation, it is not always the case it suffers from less bias • Another example: differencing also exacerbates bias from measurement error (soon!) • Instead, as usual, a credible “natural experiment” is always what is really critical