1 / 52

Regression

Regression. Econ 240A. Retrospective. Week One Descriptive statistics Exploratory Data Analysis Week Two Probability Binomial Distribution Week Three Normal Distribution Interval Estimation, Hypothesis Testing, Decision Theory. Week Four. Bivariate Relationships

lan
Download Presentation

Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Econ 240A

  2. Retrospective • Week One • Descriptive statistics • Exploratory Data Analysis • Week Two • Probability • Binomial Distribution • Week Three • Normal Distribution • Interval Estimation, Hypothesis Testing, Decision Theory

  3. Week Four • Bivariate Relationships • Correlation and Analysis of Variance

  4. Outline • A cognitive device to help understand the formulas for estimating the slope and the intercept, as well as the analysis of variance • Table of Analysis of Variance (ANOVA) for regression • F distribution for testing the significance of the regression, i.e. does the independent variable, x, significantly explain the dependent variable, y?

  5. Outline (Cont.) • The Coefficient of Determination, R2, and the Coefficient of Correlation, r. • Estimate of the error variance, s2. • Hypothesis tests on the slope, b.

  6. Part I: A Cognitive Device

  7. A Cognitive Device: The Conceptual Model • (1) yi = a + b*xi + ei • Take expectations , E: • (2) E yi = a + b*E xi +E ei, where • assume (3) E ei =0 • Subtract (2) from (1) to obtain model in deviations: • (4) [yi - E yi ] = b*[xi - E xi ] + ei • Multiply (3) by [xi - E xi ] and take expectations:

  8. A Cognitive Device: (Cont.) • (5) E{[yi - E yi ] [xi - E xi ]} = b*E[xi - E xi ]2 + E{ei [xi - E xi ] }, where assume • E{ei [xi - E xi ] }= 0, i.e. e and x are independent • By definition, (6) cov yx = b* var x, i.e. • (7) b= cov yx/ var x • The corresponding empirical estimate, by the method of moments:

  9. A Cognitive Device (Cont.) • The empirical counter part to (2) • Square both sides of (4), and take expectations, • (10) E [yi - E yi ]2 = b2*E[xi - E xi ]2 + 2E{ei*[xi - E xi ]}+ E[ei]2 • Where (11) E{ei*[xi - E xi ] = 0 , i.e. the explanatory variable x and the error e are assumed to be independent, cov ex = 0

  10. A Cognitive Device (Cont.) • From (10) by definition • (11) var y = b2 * var x + var e, this is the partition of the total variance in y into the variance explained by x, b2 * var x , and the unexplained or error variance, var e. • the empirical counterpart to (11) is the total sum of squares equals the explained sum of squares plus the unexplained sum of squares:

  11. A Cognitive Device (Cont.) • From Eq. 7, substitute for b in Eq. 11: • Var y = [covyx]2/Var x + Var e • Divide by Var y: 1 = [covyx]2/vary*varx + var e/var y • or 1 = r2 + var e/var y where r is the correlation coefficient

  12. Population Model and Sample Model Side by Side

  13. Conceptual (1) yi = a + b*xi + ei Take expectations, E (2) Ey = a + b*Ex + Eei (3) Where Eei = 0 Subtract (2) from (1) (4)[yi - Ey] = b*[xi -Ex] + ei Fitted Minimize Conceptual Vs. Fitted Model

  14. Conceptual Multiply (4) by [xi - Ex] and take expectations, E E [yi - Ey] [xi -Ex] = b*E [xi -Ex]2 + Eei* [xi -Ex], (5) where Eei* [xi -Ex] = 0 (6) cov[y*x] = b*varx (7) b = cov[y*x]/varx Fitted First order condition compare (3) & (vi) From (v) the fitted line goes through the sample means Conceptual Vs. Fitted (Cont.)

  15. Conceptual vs. Fitted (Cont.)

  16. Part II: ANOVA in Regression

  17. ANOVA • Testing the significance of the regression, i.e. does x significantly explain y? • F1, n -2 = EMS/UMS • Distributed with the F distribution with 1 degree of freedom in the numerator and n-2 degrees of freedom in the denominator

  18. Table of Analysis of Variance (ANOVA) F1,n -2 = Explained Mean Square / Error Mean Square

  19. Example from Lab Four • Linear Trend Model for UC Budget

  20. Time index, t = 0 for 1968-69, t=1 for 1969-70 etc

  21. Example from Lab Four • Exponential trend model for UC Budget • UCBud(t) =exp[a+b*t+e(t)] • taking the logarithms of both sides • ln UCBud(t) = a + b*t +e(t)

  22. Time index, t = 0 for 1968-69, t=1 for 1969-70 etc Exp(-0.950) = 0.387

  23. Part III: The F Distribution

  24. ! ! ! The F Distribution • The density function of the F distribution:n1 and n2 are the numerator and denominator degrees of freedom.

  25. The F Distribution • This density function generates a rich family of distributions, depending on the values of n1 and n2 n1 = 5, n2 = 10 n1 = 50, n2 = 10 n1 = 5, n2 = 10 n1 = 5, n2 = 1

  26. Determining Values of F • The values of the F variable can be found in the F table, Table 6(a) in Appendix B for a type I error of 5%, or Excel . • The entries in the table are the values of the F variable of the right hand tail probability (A), for which P(Fn1,n2>FA) = A.

  27. Time index, t = 0 for 1968-69, t=1 for 1969-70 etc F1, 35 = (n-2)*[R2/(1 - R2) =35*(0.933/0.067)= 487

  28. 1 dof 35 dof F1,35 = 4.12

  29. Part IV: The Pearson Coefficient of Correlation, r • The Pearson coefficient of correlation, r, is (13) r = cov yx/[var x]1/2 [var y]1/2 • Estimated counterpart • Comparing (13) to (7) note that (15) r*{[var y]1/2 /[var x]1/2}= b

  30. A Cognitive Device: (Cont.) • (5) E{[yi - E yi ] [xi - E xi ]} = b*E[xi - E xi ]2 + E{ei [xi - E xi ] }, where assume • E{ei [xi - E xi ] }= 0, i.e. e and x are independent • By definition, (6) cov yx = b* var x, i.e. • (7) b= cov yx/ var x • The corresponding empirical estimate:

  31. Part IV (Cont.) The coefficient of Determination, R2 • For a bivariate regression of y on a single explanatory variable, x, R2 = r2, i.e. the coefficient of determination equals the square of the Pearson coefficient of correlation • Using (14) to square the estimate of r

  32. Part IV (Cont.) • Using (8), (16) can be expressed as • And so • In general, including multivariate regression, the estimate of the coefficient of determination, , can be calculated from (21) =1 -USS/TSS .

  33. Part IV (Cont.) • For the bivariate regression, the F-test can be calculated from F1, n-2 = [(n-2)/1][ESS/TSS]/[USS/TSS] F1, n-2 = [(n-2)/1][ESS/USS]=[(n-2)] • For a multivariate regression with k explanatory variables, the F-test can be calculated as Fk, n-2 = [(n-k-1)/k][ESS/USS] Fk, n-2 = [(n-k-1)/k]

  34. Time index, t = 0 for 1968-69, t=1 for 1969-70 etc R2 = 1 – 2,018,596/30,113,042

  35. Part V:Estimate of the Error Variance • Var ei = s2 • Estimate is unexplained mean square, UMS • Standard error of the regression is

  36. Time index, t = 0 for 1968-69, t=1 for 1969-70 etc

  37. Part VI: Hypothesis Tests on the Slope • Hypotheses, H0 : b=0; HA: b>0 • Test statistic: • Set probability for the type I error, say 5% • Note: for bivariate regression, the square of the t-statistic for the null that the slope is zero is the F-statistic

  38. t = {81.6 - 0]/3.70 = 22.1 t2 = F, i.e. 22.1*22.1 = 488 t2 = F, i.e. 22.36*22.36 = 500

  39. Part VII: Student’s t-Distribution

  40. The Student t Distribution • The Student t density function n is the parameter of the student t distribution E(t) = 0 V(t) = n/(n – 2) (for n > 2)

  41. The Student t Distribution n = 3 n = 10

  42. Determining Student t Values • The student t distribution is used extensively in statistical inference. • Thus, it is important to determine values of tA associated with a given number of degrees of freedom. • We can do this using • t tables , Table 4 Appendix B • Excel

  43. A A = .05 = .05 -tA Using the t Table t t t t • The table provides the t values (tA) for which P(tn > tA) = A The t distribution is symmetrical around 0 tA =-1.812 =1.812 t.100 t.05 t.025 t.01 t.005

  44. Problem 6.32 in TextTable of Joint Probabilities

  45. Problem 6.32 • The method of instruction in college and university applied statistics courses is changing. Historically, most courses were taught with an emphasis on manual calculation. The alternative is to employ a computer and a software package to perform the calculations. An analysis of applied statistics courses investigated whether the instructor’s educational background is primarily mathematics (or statistics) or some other field.

  46. Problem 6.32 • A. What is the probability that a randomly selected applied statistics course instructor whose education was in statistics emphasizes manual calculations? • What proportion of applied statistics courses employ a computer and software? • Are the educational background of the instructor and the way his or her course are taught independent?

  47. Midterm 2000 • .(15 points) The following table shows the results of regressing the natural logarithm of California General Fund expenditures, in billions of nominal dollars, against year beginning in 1968 and ending in 2000. A plot of actual, estimated and residual values follows. • .How much of the variance in the dependent variable is explained by trend? • .What is the meaning of the F statistic in the table? Is it significant? • .Interpret the estimated slope. • .If General Fund expenditures was $68.819 billion in California for fiscal year 2000-2001, provide a point estimate for state expenditures for 2001-2002.

More Related