1 / 10

More Regression Information

More Regression Information. On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing I look at is the coefficients. See cell b28 has the word coefficient. We take the information below and write the equation as

brita
Download Presentation

More Regression Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More Regression Information

  2. On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing I look at is the coefficients. See cell b28 has the word coefficient. We take the information below and write the equation as y hat = 60 + 5x. This is the estimated regression equation. The intercept is 60 and the slope is 5. Remember x = population of students and y = sales.

  3. Hypothesis test about the population slope B1. Remember we have taken a sample of data. In this context we have taken a sample and estimated the unknown population regression. Our real point in a study like this is to see if a relationship exists between the two variables in the population. If the slope is not zero in the population, then the x variable has an influence on the outcome of y. Now, in a sample, the estimated slope may or may not be zero. But the sample provides a basis for a test of the true unknown population slope being zero. For the test we will use the t distribution.

  4. The t-distribution At this stage of the game I am going to have you accept some of the following without much proof. The t-distribution is like the normal except for two notable features. 1) t-distributions tend to be wider (show more variability) than z distributions. 2) the t-distribution does not have one standard like the normal distribution. Each t-distribution is unique, based on its degrees of freedom. Admittedly, degrees of freedom is a term without much meaning to you, but in the context of simple regression equals the sample size minus 2.

  5. Many books have t-tables. Or you could do a Google search. Go to the upper tail area being .025. If you run down the column with your finger you will notice at the bottom the number 1.96. So, when the degrees of freedom is really large, the t is like the z. But, with other degrees of freedom on the t-distribution, you have to go out farther than 1.96 to get to .025 in the upper tail. This is what I mean be t-distributions being wider. The t-values in this table are critical values for tests of hypotheses. Back to our hypothesis test about the slope. The null hypothesis is that B1 = 0, and the alternative is that B1 is not equal to zero. Since the alternative is not equal to zero we have a two-tailed test. Our example has a sample size of 10, so the degrees of freedom is 8. A level of significance of .05 means we want .025 on each side for a two tail test. From t-table the critical t is 2.306.

  6. Back on the computer output we see the calculated t in cell d30. The t stat from the sample is the slope divided by the standard error. Notice the t is 8.6167. Since this is bigger than the critical t we reject the null and conclude the slope is not zero in the population. Thus in the population of all company stores, sales are influenced by populations of students in the college towns. Excel prints the p-value for the test. For the slope we have 2.55E-05. E notation of the form E-05 means move the decimal in the number 5 places to the left. So our p-value is 0.0000255. This is a two-tailed p-value. Since this is less than .05, it is an alternative way to reject the null hypothesis. This method can be used without looking at the t-table.

  7. In cells f30 and g30 you have the 95% confidence interval for the slope. The interval is (3.6619, 6.3381). So you can be 95% confident the true unknown population slope is in this interval. A few slides back I wrote ,” From the t-table the critical t is 2.306.” The margin of error in the confidence interval is the critical value times the standard error: (2.306 ).5803 = 1.3381 for an interval for the slope 5 – and + 1.3381. In cell b17 you see the R square value of 0.9027. Sometimes this is called r2, and its real name is the coefficient of determination. The coefficient of determination is a statistic used to see how well the data points “hug” the regression line. The value can be anywhere from 0 to 1. If all the data points actually touch the line then R square would be 1. If the value is 0 the points are not close to the line at all.

  8. The square root of the coefficient of determination is the correlation coefficient ( called r). Remember the correlation coefficient was an indicator of the direction and strength of the relationship between two variables. The correlation coefficient could be anywhere from minus 1 to 1. Negative values meant a negative relationship and positive values meant a positive relationship. There we said the closer to 1 or minus 1 the stronger the relationship. If R square = 1, r = 1 and the relationship is as strong as you can get. If R square = .9, r = .94 and you still have a pretty strong relationship. If R square = .5, r = .71 and you would still be in the strong relationship neighborhood.

  9. Well, in this section I have tried to go over some of the basic regression ideas. The point again is that we are studying two variables together and trying to establish if the two variables are related or not. Why should we care if two variables are related? As a person in business it might help the bottom line. As another example, say it can be established that the size of the advertising budget has an impact on sales. This could help us determine the right size budget. I have a claim that one day I will try to back up by using regression. I claim that recycling of paper makes states in the country have less trees. Each state probably recycles a different number of pounds of paper and has a certain amount of tree population growth or destruction. With tree population as the dependent variable, I would expect the slope coefficient on pounds of paper recycled to be negative. In other words, the more recycling, the less trees. (ITS an econ story, but anyway.) Regression can be used in social policy analysis. Anyway, that’s all for now.

More Related