80 likes | 262 Views
Regression results. In this section you will see an explanation of the Microsoft Excel regression results I put out as another file.
E N D
Regression results In this section you will see an explanation of the Microsoft Excel regression results I put out as another file.
Perhaps the first thing you should explore is the column toward the bottom of the printout titled “Coefficients.” In our example you see the values –1.75 and 1.75. This suggests that the regression line is (and remember income = y and years of schooling = x) y = -1.75 + 1.75x (to be consistent with earlier notes I should have hats over the y and x terms.) So the –1.75 is the intercept of the line and 1.75 is the slope (we have a coincidence in that the numbers are –1.75 and 1.75.) Note that to the left of 1.75 you see the word schooling. That is the variable that we think has an influence on income. The t stat in the schooling row is similar to a Z value. The p-value is then the area in tail above 1.75 and below –1.75 (not the intercept, remember we have a coincidence of numbers – this will not happen much.)
A standard hypothesis test is about the slope of the line being zero. This is a test of the x variable having an effect of the y variable. Digress. Say I have one of two decks of cards and you do not know which I have. The options are I have a standard deck (spades, hearts, diamonds, and clubs) or a rigged deck with 4 sets of hearts. Say I deal you a royal flush hearts. The standard hypothesis is that I have a standard deck. But the sample of a royal flush has a low probability given a standard deck. The sample gives evidence that the standard hypothesis is not true. You would conclude I had the rigged deck. If I get a value for the slope in a regression different from zero, I still want to test it as zero. If the p-value is real low, and the value of real low is typically at .05 or less, then we will reject the idea that the slope is zero and say the x variable really does influence the y variable.
Now, in our example the p-value is .032085572 and this is in the real low range so we conclude that years of schooling does have an impact on income. Now, the slope = 1.75 is a point estimate of the true population slope, but because of the sampling variability we might look to a confidence interval. We have (and I do not know why Excel prints the interval out twice) .282963364 to 3.217036636. So we are 95% sure the true unknown population slope is in this interval. If the interval includes the value zero we say the x variable has no influence on the y variable. Clearly, here the x has an influence of the y variable.
As long as you have seen that x does influence y, then you can use the equation to forecast. We have y = -1.75 + 1.75x. So, for an x = 20 we forecast y = -1.75 + 1.75(20) = 33.25
Toward the top of the output you see the words “regression Statistics.” R Square is of interest to us because it tells us the percentage of the variation in y that is explained by the x variable. In our example we have .827702703. So a little more than 82% of the variation in income is explained by schooling. The closer to 1 r square is the better the fit of the regression. If you do not have a value of 1, then potentially there are other variables that may also help explain variation in y. You would want to include addition x values in the regression. Just include them in your regression. You will then want to look at two addition pieces of information.
Adjusted r square When you add x variables to a regression analysis it is likely r square will rise, but sometimes the rise is not really truly happening – it is a long story and depends on the mathematics of the situation. Fortunately, adjusted r square is a measure that we could then look to as a truer reflection of the true amount of variation in y that is explained by all the x variables included in the regression. If you have only 1 x variable the printed adjust r square is meaningless.
F and Significance of F In the middle of the printout and toward the right you see F and Significance of F. You will note that the F equals the squared value of t stat for schooling. This is true when you have one x variable. Plus you will note Significance of F is equal to the p-value for schooling. So when you have one x variable the Sig F and the p-value both do the same thing. Low values (.05 or less) suggest x does help explain variation in y. If you have more than one x variable in the regression the SIG f will not equal a p-value on a t stat (except by mere coincidence). But the SIG F is used to test the hypothesis that all the x variables as a package deal help explain variation in y. Low value suggest the x variables do explain variation in y.