440 likes | 1.11k Views
2. Fixed Effects Models. 2.1 Basic fixed-effects model 2.2 Exploring panel data 2.3 Estimation and inference 2.4 Model specification and diagnostics 2.5 Model extensions Appendix 2A - Least squares estimation. 2.1 Basic fixed effects model. Basic Elements
E N D
2. Fixed Effects Models • 2.1 Basic fixed-effects model • 2.2 Exploring panel data • 2.3 Estimation and inference • 2.4 Model specification and diagnostics • 2.5 Model extensions • Appendix 2A - Least squares estimation
2.1 Basic fixed effects model • Basic Elements • Subject i is observed on Ti occasions; • i = 1, ..., n, • TiT, the maximal number of time periods. • The response of interest is yit. • The K explanatory variables are xit = {xit1, xit2, ..., xitK}´, a vector of dimension K 1. • The population parameters are = (1, ..., K)´, a vector of dimension K 1.
Observables Representation of the Linear Model • E yit = + 1xit1+2xit2+ ... + KxitK. • {xit,1, ... , xit,K} are nonstochastic variables. • Var yit = σ2. • { yit } are independent random variables. • { yit } are normally distributed. • The observable variables are {xit,1, ... , xit,K , yit}. • Think of {xit,1, ... , xit,K} as defining a strata. • We take a random draw, yit, from each strata. • Thus, we treat the x’s as nonstochastic • We are interested in the distribution of y, conditional on the x’s.
Error Representation of the Linear Model • yit = + 1 xit,1+ 2xit,2+ ... + Kxit,K + εit where E εit = 0. • {xit,1, ... , xit,K} are nonstochastic variables.. • Var εit = σ2. • { εit } are independent random variables. • This representation is based on the Gaussian theory of errors – it is centered on the unobservable variable εit. • Here, εit are i.i.d., mean zero random variables.
Heterogeneous model • We now introduce a subscript on the intercept term, to account for heterogeneity. • E yit = i+ 1 xit,1+ 2xit,2+ ... + Kxit,K. • For short-hand, we write this as E yit = i+ xit´
Analysis of covariance model • The intercept parameter, , varies by subject. • The population parameters do not but control for the common effect of the covariates x. • Because the errors are mean zero, the expected response is E yit = i + xit´.
Parameters of interest • The common effects of the explanatory variables are dictated by the sign and magnitude of the betas (´s) • These are the parameters of interest • The intercept parameters vary by subject and account for different behavior of subjects. • The intercept parameters control for the heterogeneity of subjects. • Because they are of secondary interest, the intercepts are called nuisance parameters.
Time-specific analysis of covariance • The basic model also is a traditional analysis of covariance model. • The basic fixed-effects model focuses on the mean response and assumes: • no serial correlation (correlation over time) • no cross-sectional (contemporaneous) correlation (correlation between subjects) • Hence, no special relationship between subjects and time is assumed. • By interchanging i and t, we may consider the model yit = t + xit´ + it . • The parameters t are time-specific variables that do not depend on subjects.
Subject and time heterogeneity • Typically, the number of subjects, n, substantially exceeds the maximal number of time periods, T. • Typically, the heterogeneity among subjects explains a greater proportion of variability than the heterogeneity among time periods. • Thus, we begin with the “basic” model yit = i + xit´ + it. • This model allows explicit parameterization of the subject-specific heterogeneity. • By using binary variables for the time dimension, we can easily incorporate time-specific parameters.
2.2 Exploring panel data • Why Explore? • Many important features of the data can be summarized numerically or graphically without reference to a model • Data exploration provides hints of the appropriate model • Many social science data sets are observational - they do not arise as the result of a designed experiment • The data collection mechanism does not dictate the model selection process. • To draw reliable inferences from the modeling procedure, it is important that the data be congruent with the model. • Exploring the data also alerts us to any unusual observations and/or subjects.
Data exploration techniques • Panel data is a special case of regression data. • Techniques applicable to regression are also useful for panel data. • Some commonly used techniques include: • Summarize the distribution of y and each x: • Graphically, through histograms and other density estimators • Numerically, through basic summary statistics (mean, median, standard deviation, minimum and maximum) . • Summarize the relation between between y and each x: • Graphically, through scatter plots • Numerically, through correlation statistics • Summary statistics by time period may be useful for detecting temporal patterns. • Three more specialized (for panel data) techniques are: • Multiple time series plots • Scatterplots with symbols • Added variable plots. • Section 2.2 discusses additional techniques; these are performed after the fit of a preliminary model.
Multiple time series plots • Plot of the response, yit, versus time t. • Serially connect observations among common subjects. • This graph helps detect: • Patterns over time • Unusual observations and/or subjects. • Visualize the heterogeneity.
Scatterplots with symbols • Plot of the response, yit, versus an explanatory variable, xitj • Use a plotting symbol to encode the subject number “i” • See the relationship between the response and explanatory variable yet account for the varying intercepts. • Variation: If there is a separation in the x’s, such as increasing over time, • then we can serially connect the observations. • We do not need a separate plotting symbol for each subject.
Basic added variable plot • This is a plot of versus . • Motivation: Typically, the subject-specific parameters account for a large portion of the variability. • This plot allows us to visualize the relationship between y and each x, without forcing our eye to adjust for the heterogeneity of the subject-specific intercepts.
2.3 Estimation and inference • Least squares estimates • By the Gauss-Markov theorem, the best linear unbiased estimates are the ordinary least square (ols) estimates. • These are given by: • and • Here, and are averages of {yit} and {xit} over time. • Time-constant x’s prevent one from getting unique estimates of b !!!
Estimation details • Although there are n+K unknown parameters, the calculation of the ols estimates requires inversion of only a K×K matrix. • The ols estimate of b can also be expressed as a weighted average of estimates of subject-specific parameters. • Suppose that all parameters are subject-specific so that the model is yit = ai + xit´bi + eit • The ols estimate of bi turns out to be • Define the weighting matrix • With this weight, we can express the ols estimate of b as • a weighted average of subject-specific parameter estimates.
Properties of estimates • Both aiand b have the usual properties of ols regression estimators • They are unbiased estimators. • By the Gauss-Markov theorem, they are minimum variance among the class of unbiased estimates. • To see this, consider an expression of the ols estimate of b, • That is, b is a linear combination of responses. • If the responses are normally distributed, then so is b. • The variance of b turns out to be
ANOVA and standard errors • This follows the usual regression set-up. • We define the residuals as eit = yit - (ai + xit´b) . • The error sum of squares is Error SS = Siteit2. • The mean square error is • the residual standard deviation is s. • The standard errors of the slope estimates are from the square root of the diagonal of the estimated variance matrix
Consistency of estimates • As the number of subjects (n) gets large, then b approaches b. • Specifically, weak consistency means approaching (convergence) in probability. • This is a direct result of the unbiasedness and an assumption that SiWigrows without bound. • As n gets large, the intercept estimates aido not approach ai. • They are inconsistent. • Intuitively, this is because we assume that the number of repeated measurements of aiis Ti, a bounded number.
Other large sample approximations • Typically, the number of subjects is large relative to the number of time periods observed. • Thus, in deriving large sample approximations of the sampling distributions of estimators, assume that n although T remains fixed. • With this assumption, we have a central limit theorem for the slope estimator. • That is, b is approximately normally distributed even though though responses are not. • The approximation improves as n becomes large. • Unlike the usual regression set-up, this is not true for the intercepts. If the responses are not normally distributed, then ai are not even approximately normal.
2.4 Model specification and diagnostics • Pooling Test • Added variable plots • Influence diagnostics • Cross-sectional correlations • Heteroscedasticity
Pooling test • Test whether the intercepts take on a common value, say a. • Using notation, we wish to test the null hypothesis H0: a1= a2= ... = an= a. • This can be done using the following partial F- (Chow) test: • Run the “full model” yit = i+ xit´ + itto get Error SS and s2 . • Run the “reduced model” yit = + xit´ + itto get (Error SS)reduced . • Compute the partial F-statistic, • Reject H0 if F exceeds a quantile from an F-distribution with numerator degrees of freedom df1 = n-1 and denominator degrees of freedom df2 = N-(n+K).
Added variable plot • An added variable plot (also called a partial regression plot) is a standard graphical device used in regression analysis • Purpose: To view the relationship between a response and an explanatory variable, after controlling for the linear effects of other explanatory variables. • Added variable plots allow us to visualize the relationship between y and each x, without forcing our eye to adjust for the differences induced by the other x’s. • The basic added variable plot is a special case.
Procedure for making an added variable plot • Select an explanatory variable, say xj. • Run a regression of y on the other explanatory variables (omitting xj) • calculate the residuals from this regression. Call these residuals e1. • Run a regression of xj on the other explanatory variables (omitting xj) • calculate the residuals from this regression. Call these residuals e2. • The plot of e1 versus e2 is an added variable plot.
Correlations and added variable plots • Let corr(e1, e2 ) be the correlation between the two sets of residuals. • It is related to the t-statistic of xj, t(bj) , from the full regression equation (including xj) through: • Here, K is the number of regression coefficients in the full regression equation and N is the number of observations. • Thus, the t-statistic can be used to determine the correlation coefficient of the added variable plot without running the three step procedure. • However, unlike correlation coefficients, the added variable plot allows us to visualize potential nonlinear relationships between y and xj .
Influence diagnostics • Influence diagnostics allow the analyst to understand the impact of individual observations and/or subjects on the estimated model • Traditional diagnostic statistics are observation-level • of less interest in panel data analysis • the effect of unusual observations is absorbed by subject-specific parameters. • Of greater interest is the impact that an entire subject has on the population parameters. • We use the statistic • Here, b(i) is the ols estimate b calculated with the ith subject omitted.
Calibration of influence diagnostic • The panel data influence diagnostic is similar to Cook’s distance for regression. • Cook’s distance is calculated at the observational level yet Bi(b) is at the subject level • The statistic Bi(b) has an approximate c2 (chi-square) with K degrees of freedom • Observations with a “large” value of Bi(b) may be influential on the parameter estimates. • Use quantiles of the c2 to quantify the adjective “large.” • Influential observations warrant further investigation • they may need correction, additional variable specification to accommodate differences or deletion from the data set.
Cross-sectional correlations • The basic model assumes independence between subjects. • Looking at a cross-section of subjects, we assume zero cross-sectional correlation, that is, rij = Corr (yit ,yjt) = 0 for ij. • Suppose that the “true” model is yit = lt + xit´b +eit ,where lt is a random temporal effect that is common to all subjects. • This yields Var yit= sl2 + s2 • The covariance between observations at the same time but from different subjects is Cov (yit ,yjt) = sl2 , ij. • Thus, the cross-sectional correlation is
Testing for cross-sectional correlations • To test H0: rij = 0 for all ij, assume that Ti = T . • Calculate model residuals {eit}. • For each subject i, calculate the ranks of each residual. • That is, define {ri,1 , ..., ri,T} to be the ranks of {ei,1 , ..., ei,T} . • Ranks will vary from 1 to T, so the average rank is (T+1)/2. • For the ith and jth subject, calculate the rank correlation coefficient (Spearman’s correlation) • Calculate the average Spearman’s correlation and the average squared Spearman’s correlation • Here, S{i<j}means sum over i=1, ..., j-1 and j=2, ..., n.
Calibration of cross-sectional correlation test • We compare R2ave to a distribution that is a weighted sum of chi-square random variables (Frees, 1995). • Specifically, define Q = a(T) (c12- (T-1)) + b(T) (c22- T(T-3)/2) . • Here, c12 andc22 are independent chi-square random variables with T-1 and T(T-3)/2 degrees of freedom, respectively. • The constants are a(T) = 4(T+2) / (5(T-1)2(T+1)) and b(T) = 2(5T+6) / (5T(T-1)(T+1)) .
Calculation short-cuts • Rule of thumb for cut-offs for the Q distributon . • To calculate R2ave • Define • For each t, u, calculate SiZi,t,uandSiZi,t,u2. . • We have • Here, S{t,u} means sum over t=1, ..., T and u=1, ..., T. • Although more complex in appearance, this is a much faster computation form for R2ave. • Main drawback - the asymptotic distribution is only available for balanced data.
Heteroscedasticity • Carroll and Ruppert (1988) provide a broad treatment • Here is a test due to Breusch and Pagan (1980). • Ha: Vareit = s2 + g¢wit, where wit is a known vector of weighting variables and gis a p-dimensional vector of parameters. • H0: Vareit = s2. This procedure is: • Fit a regression model and calculate the model residuals, {rit}. • Calculate squared standardized residuals, • Fit a regression model of on wit. • The test statistic is LM = (Regress SSw)/2, where Regress SSw is the regression sum of squares from the model fit in step 3. • Reject the null hypothesis if LM exceeds a percentile from a chi-square distribution with p degrees of freedom. The percentile is one minus the significance level of the test.
2.5 Model extensions • In panel data, subjects are measured repeatedly over time. Panel data analysis is useful for studying subject changes over time. • Repeated measurements of a subject tend to be intercorrelated. • Up to this point, we have used time-varying covariates to account for the presence of time in the mean response. • However, as in time series analysis, it is also useful to measure the tendencies in time patterns through a correlation structure.
Timing of observations • We now specify the time periods when the observations are made. • We assume that we have at most T observations on each subject. • These observations are made at time periods t1, t2, ..., tT. • Each subject has observations made at a subset of these T time periods, labeled t1, t2, ..., tTi. • The subset may vary by subject and thus could be denoted by t1i, t2i, ..., tTii. • For brevity, we use the simpler notation scheme and drop the second i subscript. • This framework, although notationally complex, allows for missing data and incomplete observations.
Temporal covariance matrix • For a full set of observations, let R denote the TT temporal (time) variance-covariance matrix. • This is defined by R = Var (i) • Let Rrs = Cov (ir, is) is the element in the rth row and sth column of R. • There are at most T(T+1)/2 unknown elements of R. • Denote this dependence of R on parameters using R(). Here, is the vector of unknown parameters of R. • For the ith observation, we have Var (i ) = Ri(), a TiTi matrix. • The matrix Ri() can be determined by removing certain rows and columns of the matrix R(). • We assume that Ri() is positive-definite and only depends on i through its dimension.
Special cases of R • R = 2I, where I is a TT identity matrix. This is the case of no serial correlation. • R = 2 ( (1-) I + J ), where J is a TT matrix of 1’s. This is the uniform correlation model (also called “compound symmetry”). • Consider the model yit = i + it,where i is a random cross-sectional effect. • This yields Rtt = Var it = + . • For rs, consider Rrs = Cov (yir ,yis) = . • To write this in terms of 2, note that Corr (it , is ) = / ( + ) = • Thus, Rrs= 2 .
More special cases of R: • Rrs = 2 exp( - | tr - ts| ) . • In the case of equally spaced in time observations, we may assume that tr+1 - tr = 1. Thus, Rrs = 2 |r-s|, where = exp (- ) . • This is the autoregressive of order one model, denoted by AR(1). • More generally, for equally spaced in time observations, assume Cov (ir , is) = Cov (ij , ik) for |r-s| = |j-k|. • This is a stationary assumption. • It implies homoscedasticity. • There are only T unknown elements of R, “Toeplitz” matrix. • Assume only homoscedasticity. • There are 1 + T(T-1)/2 unknown elements of R, corresponding to the variance and the correlation matrix. • Make no assumptions on R. • There are T(T+1)/2 unknown elements of R.
Subject-specific slopes • Let one or more slopes vary by subject. • The fixed effects linear panel data model is yit = zit´i + xit´ + it. • The q explanatory variables are zit = {zit1, zit2, ..., zitq}´, a vector of dimension q 1. • The subject-specific parameters are ai = (ai1, ..., aiq)´, a vector of dimension q 1. • This is short-hand notation for the model yit = i1 zit1 +... + iq zitq +1xit1+... + KxitK+ it. • The responses between subjects are independent • We allow for temporal correlation through the assumption that Var i = Ri().
Assumptions of the Fixed Effects Linear Longitudinal Data Model • E yi = Ziαi + Xiβ. • {xit,1, ... , xit,K} and {zit,1, ... , zit,q} are nonstochastic. • Var yi = Ri(τ) = Ri. • {yi} are independent random vectors. • {yit} are normally distributed.
Least Squares Estimates • The estimates are derived in Appendix 2A.2. • They are given by: • and • Here,
Robust estimation of standard errors • It is common practice to ignore serial correlation and heteroscedasticity, so that one assumes Ri = s 2Ii . • Thus, • where • Huber (1967), White (1980) and Liang and Zeger (1986) suggested replacing Ri by eiei¢. Here, ei is the vector of residuals. Thus, a robust standard error of bj is