3SLS

3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables on both the left and right hand sides of the equation. THAT IS THE 2SLS PART. But there error terms in each equation are also correlated. Efficient estimation requires we take account of this. THAT IS THE SUR (SEEMINGLY UNRELATED REGRESSIONS). PART. Hence in the regression for the ith equation there are endogenous (Y ) variables on the rhs AND the error term is correlated with the error terms in other equations.

3SLS log using "g:summ1.log" If you type the above then a log is created on drive g (on my computer this is the flash drive, on yours you may need to specify another drive. The name summ1 can be anything. But the suffx must be log At the end you can close the log by typing: log close So open a log now and you will have a record of this session

3SLS Load Data Clear use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2 THAT link no longer works. But the following does webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc

*generate variables generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1]

OLS Regression regress c p p1 w Regresses c on p , p1 and w (what this equation means is not so important).

Usual output

reg3 By the command reg3, STATA estimates a system of structural equations, where some equations contain endogenous variables among the explanatory variables. Estimation is via three-stage least squares (3SLS). Typically, the endogenous regressors are dependent variables from other equations in the system. In addition, reg3 can also estimate systems of equations by seemingly unrelated regression (SURE), multivariate regression (MVREG), and equation-by-equation ordinary least squares (OLS) or two-stage least squares (2SLS).

2SLS Regression reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1) Regresses c on p , p1 and w. The instruments (i.e. The predetermined or exogenous variables in this equation and the rest of the system) are t wg g yr p1 x1 k1 This means that p and w (which are not included in the instruments are endogenous).

The output is as before, but it confirms what the exogenous and endogenous variables are.

2SLS Regression ivreg c p1 (p w = t wg g yr p1 x1 k1) This is an alternative command to do the same thing. Note that the endogenous variables on the right hand side of the equation are specified in (p w And the instruments follow the = sign.

The results are identical.

3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) This format does two new things. First it specifies all the three equations in the system. Note it has to do this. Because it needs to calculate the covariances between the error terms and for this it needs to know what the equations – and hence the errors –are. Secondly it says 3sls not 2sls

All 3 equations are printed out. This tells us what these equations look like

Lets compare the three different sets of equations. Look at the coefficient on w. In OLS very significant and in 2SLS not significant but in 3SLS its back to similar with OLS and significant. That is odd.Now I expect that if 2sls is different because of bias then so should 3sls. As it stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does not make an awful lot of sense.But we do not have many observations. Perhaps that is partly why.

3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) Now this command stores the variances and covariances between the error terms in a matrix I call sig. You have used generate to generate variables, scalar to generate scalars. Similarly matrix produces a matrix. e(Sigma)stores this variance covariance matrix from the previous regression

3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) display sig[1,1], sig[1,2], sig[1,3] display sig[2,1], sig[2,2], sig[2,3] display sig[3,1], sig[3,2], sig[3,3] . display sig[1,1], sig[1,2], sig[1,3] 1.0440596 .43784767 -.3852272 . . display sig[2,1], sig[2,2], sig[2,3] .43784767 1.3831832 .19260612 . . display sig[3,1], sig[3,2], sig[3,3] -.3852272 .19260612 .47642626 Variance of 1st error term Covariance of error terms from equations 2 and 3

3SLS Regression . This relates to the variance covariance matrix in the lecture Hence 0.437848 relates to σ12 and of course σ21 This matrix is Σ

3SLS Regression display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5) Now this should give the correlation between the error terms from equations 1 and 2. It is this formula Correlation (x, y) = σxy /(σx σx). When we do this we get:

Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc matrix cy= e(b) stores the coefficients from the regression in a regression vector we call cy, cy[1,1] is the first coefficient on p in the first equation cy[1,4] is the fourth coefficient in the first equation (the constant term) cy[1,5] is the first coefficient ion p in the second equation Note this is cy[1,5] NOT cy[2,1]

Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate rirc Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value from this first regression. and i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) Is the actual minus the predicted value, i.e. The error term from the 2nd equation correlate rircprints out the correlation between the two error terms

Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc The correlation is 0,30, close to what we had before. But not the same. Now the main purpose of this class is to illustrate commands. So its not too important. I think it could be because stata is not calculating the e(sigma) matrix by dividing by n-k, but just n?????

Lets check Click on help (on tool bar at the top of the screen to the right). Click on ‘stata command’ In the dialogue box type reg3 Move down towards the end of the file and you get the following

Some important retrievables e(mss_#) model sum of squares for equation # e(rss_#) residual sum of squares for equation # e(r2_#) R-squared for equation # e(F_#) F statistic for equation # (small) e(rmse_#) root mean squared error for equation # e(ll) log likelihood Where # is a number e.g. If 2 it means equation 2. And Matrices e(b) coefficient vector e(Sigma) Sigma hat matrix e(V) variance-covariance matrix of the estimators

The Hausman Test Again We looked at this with respect to panel data. But it is a general test to allow us to compare an equation which has been estimated by two different techniques. Here we apply the technique to comparing ols with 3sls. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls hausman EQNols EQN3sls

The Hausman Test Again Below we run the three regressions specifying ols and store the results as EQNols. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols Then we run the three regressions specifying 3sls and store the results as EQN3sls. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls Then we do the Hausman test hausman EQNols EQN3sls

The Results The table prints out the two sets of coefficients and their difference. The Hausman test statistic is 0.06 The significance level is 0.9963 This is clearly very far from being significant at the 10% level.

The Hausman Test Again Hence it would appear that the coefficients from the two regressions are not significantly different. If OLS was giving biased estimates that 3SLS corrects they would be different. Hence we would conclude that there is no endogeneity which requires endogenous techniques. But because the error terms do appear correlated SUR is probably the approriate technique as it produces better results.

Tasks • Using the display command, e.g. • display e(mss_2) • Print on the screen some of the retrievables from eqach regression (the above the model sum of squared residuals for the second equation. • 2. Lets look at the display command • Type: • display "The residual sum of squares =" e(mss_2)

Tasks display "The residual sum of squares =" e(mss_2), "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(50) "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)

Tasks Close log: log close And have a look at it in word.

webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1] reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1) reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

3SLS

3SLS

Presentation Transcript

Using SAS for Instrumental Variable and GMM Estimation