Econometric Analysis of Panel Data

Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business

The Random Effects Model • The random effects model • ci is uncorrelated with xit for all t; • E[ci |Xi] = 0 • E[εit|Xi,ci]=0

Error Components Model Generalized Regression Model

Notation

Maximum Likelihood

MLE Panel Data Algebra (1)

MLE Panel Data Algebra (1, cont.)

MLE Panel Data Algebra (1, conc.)

Maximizing the Log Likelihood • Difficult: “Brute force” + some elegant theoretical results: See Baltagi, pp. 22-23. (Back and forth from GLS to ε2 and u2.) • Somewhat less difficult and more practical: At any iteration, given estimates of ε2 and u2 the estimator of  is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40.

Direct Maximization of LogL

 

Maximum Simulated Likelihood

Likelihood Function for Individual i

Log Likelihood Function

Computing the Expected LogL Example: Hermite Quadrature Nodes and Weights, H=5 Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018 Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532 Applications usually use many more points, up to 96 and Much more accurate (more digits) representations.

Quadrature

Gauss-Hermite Quadrature

Simulation

Convergence Results

MSL vs. ML .154272 = .023799

Two Level Panel Data • Nested by construction • Unbalanced panels • No real obstacle to estimation • Some inconvenient algebra. • In 2 step FGLS of the RE, need “1/T” to solve for an estimate of σu2. What to use?

Balanced Nested Panel Data Zi,j,k,t = test score for student t, teacher k, school j, district i L = 2 school districts, i = 1,…,L Mi = 3 schools in each district, j = 1,…,Mi Nij = 4 teachers in each school, k = 1,…,Nij Tijk = 20 students in each class, t = 1,…,Tijk Antweiler, W., “Nested Random Effects Estimation in Unbalanced Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313.

Nested Effects Model

GLS with Nested Effects

Unbalanced Nested Data • With unbalanced panels, all the preceding results fall apart. • GLS, FGLS, even fixed effects become analytically intractable. • The log likelihood is very tractable • Note a collision of practicality with nonrobustness. (Normality must be assumed.)

Log Likelihood (1)

Log Likelihood (2)

Maximizing Log L • Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program. • Numerical derivatives:

Asymptotic Covariance Matrix

An Appropriate Asymptotic Covariance Matrix

Some Observations • Assuming the wrong (e.g., nonnested) error structure • Still consistent – GLS with the wrong weights • Standard errors (apparently) biased downward (Moulton bias) • Adding “time” effects or other nonnested effects is “very challenging.” Perhaps do with “fixed” effects (dummy variables).

An Application • Y1jkt = log of atmospheric sulfur dioxide concentration at observation station k at time t, in country i. • H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced • Three levels, not 4 as in article. • Xjkt =1,log(GDP/km2),log(K/L),log(Income), Suburban, Rural,Communist,log(Oil price), average temperature, time trend.

Estimates

Rotating Panel-1 The structure of the sample and selection of individuals in a rotating sampling design are as follows: Let all individuals in the population be numbered consecutively. The sample in period 1 consists of N, individuals. In period 2, a fraction, met (0 <me2 < N1) of the sample in period 1 are replaced by mi2 new individuals from the population. In period 3 another fraction of the sample in the period 2, me2 (0 < me2<N2)individuals are replaced by mi3new individuals and so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii}. The procedure of dropping met-1 individuals selected in period t - 1 and replacing them by mitindividuals from the population in period t is called rotating sampling. In this framework total number of observations and individuals observed are ΣtNt and N1 + Σt=2 to Tmitrespectively. Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30, 1998, pp. 919-930

Rotating Panel-2 The outcome of the rotating sample for farms producing dairy products is given in Table 1. Each of the annual sample is composed of four parts or subsamples. For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first three parts (79, 62, and 98) are those not replaced during the transition from 1979 to 1980. The last subsample contains 74 newly included farms from the population. At the same time 85 farms are excluded from the sample in 1979. The difference between the excluded part (85) and the included part (74) corresponds to the changes in the rotating sample size between these two periods, i.e. 313-324 = -11. This difference includes only the part of the sample where each farm is observed consecutively for four years, Nrot. The difference in the non-rotating part, N„„„, is due to those farms which are not observed consecutively. The proportion of farms not observed consecutively, Nnonin the total annual sample, Nnonvaries from 11.2 to 22.6% with an average of 18.7 per cent.

Rotating Panels-3 • Simply an unbalanced panel • Treat with the familiar techniques • Accounting is complicated • Time effects may be complicated. • Biorn and Jansen (Scand. J. E., 1983) households cohort 1 has T = 1976,1977 while cohort 2 has T=1977,1978. • But,… “Time in sample bias…” may require special treatment. Mexican labor survey has 3 periods rotation. Some families in 1 or 2 or 3 periods.

Pseudo Panels

Econometric Analysis of Panel Data