1 / 133

Econometrics I

Econometrics I. Professor William Greene Stern School of Business Department of Economics. Econometrics I. Part 16 – Panel Data. Panel Data Sets. Longitudinal data British household panel survey (BHPS) Panel Study of Income Dynamics (PSID) … many others Cross section time series

aizza
Download Presentation

Econometrics I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Econometrics I Professor William Greene Stern School of Business Department of Economics

  2. Econometrics I Part 16 – Panel Data

  3. Panel Data Sets • Longitudinal data • British household panel survey (BHPS) • Panel Study of Income Dynamics (PSID) • … many others • Cross section time series • Penn world tables • Financial data by firm, by year • rit – rft = i(rmt - rft) + εit, i = 1,…,many; t=1,…many • Exchange rate data, essentially infinite T, large N

  4. Benefits of Panel Data • Time and individual variation in behavior unobservable in cross sections or aggregate time series • Observable and unobservable individual heterogeneity • Rich hierarchical structures • More complicated models • Features that cannot be modeled with only cross section or aggregate time series data alone • Dynamics in economic behavior

  5. Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years(Extracted from NLSY.) Variables in the file are EXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationBLK = 1 if individual is blackLWAGE = log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.  See Baltagi, page 122 for further analysis.  The data were downloaded from the website for Baltagi's text.

  6. Balanced and Unbalanced Panels • Distinction: Balanced vs. Unbalanced Panels • A notation to help with mechanics zi,t, i = 1,…,N; t = 1,…,Ti • The role of the assumption • Mathematical and notational convenience: • Balanced, n=NT • Unbalanced: • Is the fixed Ti assumption ever necessary? Almost never. • Is unbalancedness due to nonrandom attrition from an otherwise balanced panel? This would require special considerations.

  7. Application: Health Care Usage German Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsThis is an unbalanced panel with 7,293 individuals.  There are altogether 27,326 observations.  The number of observations ranges from 1 to 7.  (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).  (Downloaded from the JAE Archive)Variables in the file are DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT =  health satisfaction, coded 0 (low) - 10 (high)   DOCVIS =  number of doctor visits in last three months HOSPVIS =  number of hospital visits in last calendar yearPUBLIC =  insured in public health insurance = 1; otherwise = 0 ADDON =  insured by add-on insurance = 1; otherswise = 0 HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC =  years of schooling AGE = age in years MARRIED = marital status

  8. An Unbalanced Panel: RWM’s GSOEP Data on Health Care N = 7,293 Households

  9. A Basic Model for Panel Data • Unobserved individual effects in regression: E[yit | xit, ci] Notation: • Linear specification: Fixed Effects: E[ci | Xi ] = g(Xi). Cov[xit,ci] ≠0 effects are correlated with included variables. Random Effects: E[ci | Xi ] = 0. Cov[xit,ci] = 0

  10. Convenient Notation • Fixed Effects – the ‘dummy variable model’ • Random Effects – the ‘error components model’ Individual specific constant terms. Compound (“composed”) disturbance

  11. Estimating β • β is the partial effect of interest • Can it be estimated (consistently) in the presence of (unmeasured) ci? • Does pooled least squares “work?” • Strategies for “controlling for ci” using the sample data

  12. Assumptions for Asymptotics • Convergence of moments involving cross section Xi. • N increasing, T or Ti assumed fixed. • “Fixed T asymptotics” (see text, p. 348) • Time series characteristics are not relevant (may be nonstationary – relevant in Penn World Tables) • If T is also growing, need to treat as multivariate time series. • Ranks of matrices. X must have full column rank. (Xi may not, if Ti < K.) • Strict exogeneity and dynamics. If xit contains yi,t-1 then xit cannot be strictly exogenous. Xit will be correlated with the unobservables in period t-1. (To be revisited later.) • Empirical characteristics of microeconomic data

  13. The Pooled Regression • Presence of omitted effects • Potential bias/inconsistency of OLS – depends on ‘fixed’ or ‘random’

  14. OLS in the Presence of Individual Effects

  15. Estimating the Sampling Variance of b • s2(X́X)-1? Inappropriate because • Correlation across observations (certainly) • Heteroscedasticity (possibly) • A ‘robust’ covariance matrix • Robust estimation (in general) • The White estimator • A Robust estimator for OLS.

  16. Cluster Estimator

  17. Application: Cornwell and Rupert

  18. Bootstrap variance for a panel data estimator • Panel Bootstrap = Block Bootstrap • Data set is N groups of size Ti • Bootstrap sample is N groups of size Ti drawn with replacement.

  19. Using First Differences Eliminating the heterogeneity

  20. OLS with First Differences With strict exogeneity of (Xi,ci), OLS regression of Δyit on Δxit is unbiased and consistent but inefficient. GLS is unpleasantly complicated. Use OLS in first differences and use Newey-West with one lag.

  21. Application of a Two Period Model • “Hemoglobin and Quality of Life in Cancer Patients with Anemia,” • Finkelstein (MIT), Berndt (MIT), Greene (NYU), Cremieux (Univ. of Quebec) • 1998 • With Ortho Biotech – seeking to change labeling of already approved drug ‘erythropoetin.’r-HuEPO

  22. QOL Study • Quality of life study • i = 1,… 1200+ clinically anemic cancer patients undergoing chemotherapy, treated with transfusions and/or r-HuEPO • t = 0 at baseline, 1 at exit. (interperiod survey by some patients was not used) • yit = self administered quality of life survey, scale = 0,…,100 • xit = hemoglobin level, other covariates • Treatment effects model (hemoglobin level) • Background – r-HuEPO treatment to affect Hg level • Important statistical issues • Unobservable individual effects • The placebo effect • Attrition – sample selection • FDA mistrust of “community based” – not clinical trial based statistical evidence • Objective – when to administer treatment for maximum marginal benefit

  23. Regression-Treatment Effects Model

  24. Effects and Covariates • Individual effects that would impact a self reported QOL: Depression, comorbidity factors (smoking), recent financial setback, recent loss of spouse, etc. • Covariates • Change in tumor status • Measured progressivity of disease • Change in number of transfusions • Presence of pain and nausea • Change in number of chemotherapy cycles • Change in radiotherapy types • Elapsed days since chemotherapy treatment • Amount of time between baseline and exit

  25. First Differences Model

  26. Dealing with Attrition • The attrition issue: Appearance for the second interview was low for people with initial low QOL (death or depression) or with initial high QOL (don’t need the treatment). Thus, missing data at exit were clearly related to values of the dependent variable. • Solutions to the attrition problem • Heckman selection model (used in the study) • Prob[Present at exit|covariates] = Φ(z’θ) (Probit model) • Additional variable added to difference model i = Φ(zi’θ)/Φ(zi’θ) • The FDA solution: fill with zeros. (!)

  27. Difference in Differences With two periods, This is a linear regression model. If there are no regressors,

  28. Difference-in-Differences Model With two periods and strict exogeneity of D and T, This is a linear regression model. If there are no regressors,

  29. Difference in Differences

  30. A Tale of Two Cities • A sharp change in policy can constitute a natural experiment • The Mariel boatlift from Cuba to Miami (May-September, 1980) increased the Miami labor force by 7%. Did it reduce wages or employment of non-immigrants? • Compare Miami to Los Angeles, a comparable (assumed) city. • Card, David, “The Impact of the Mariel Boatlift on the Miami Labor Market,” Industrial and Labor Relations Review, 43, 1990, pp. 245-257.

  31. Difference in Differences

  32. Applying the Model • c = M for Miami, L for Los Angeles • Immigration occurs in Miami, not Los Angeles • T = 1979, 1981 (pre- and post-) • Sample moment equations: E[Yi|c,t,T] • E[Yi|M,79] = β79 + γM • E[Yi|M,81] = β81 + γM + δ • E[Yi|L,79] = β79 + γL • E[Yi|M,79] = β81 + γL • It is assumed that unemployment growth in the two cities would be the same if there were no immigration.

  33. Implications for Differences • If neither city exposed to migration • E[Yi,0|M,81] - E[Yi,0|M,79] = β81 –β79 (Miami) • E[Yi,0|L,81] - E[Yi,0|L,79] = β81 –β79 (LA) • If both cities exposed to migration • E[Yi,1|M,81] - E[Yi,1|M,79] = β81 –β79 + δ (Miami) • E[Yi,1|L,81] - E[Yi,1|L,79] = β81 –β79 + δ (LA) • One city (Miami) exposed to migration: The difference in differences is. • {E[Yi,1|M,81] - E[Yi,1|M,79]} – {E[Yi,0|L,81] - E[Yi,0|L,79]} = δ (Miami)

  34. yi = Xi + diαi + εi, for each individual The Fixed Effects Model E[ci | Xi ] = g(Xi); Effects are correlated with included variables. Cov[xit,ci] ≠0

  35. The Within Groups Transformation Removes the Effects

  36. Useful Analysis of Variance Notation Total variation = Within groups variation + Between groups variation

  37. WHO Data

  38. Baltagi and Griffin’s Gasoline Data World Gasoline Demand Data, 18 OECD Countries, 19 yearsVariables in the file are COUNTRY = name of country YEAR = year, 1960-1978LGASPCAR = log of consumption per carLINCOMEP = log of per capita incomeLRPMG = log of real price of gasoline LCARPCAP = log of per capita number of cars See Baltagi (2001, p. 24) for analysis of these data. The article on which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne Demand in the OECD: An Application of Pooling and Testing Procedures," European Economic Review, 22, 1983, pp. 117-137.  The data were downloaded from the website for Baltagi's text.

  39. Analysis of Variance

More Related