1 / 31

EC339: Lecture 6

EC339: Lecture 6. Chapter 5: Interpreting OLS Regression. Regression as Double Compression. DoubleCompression.xls EastNorthCentralFTWorkers.xls [DoubleCompression] SATData Math and Verbal scores Note simple statistics summary and correlation Conditional Mean E[Y|X] Start with Scatterplot.

berny
Download Presentation

EC339: Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EC339: Lecture 6 Chapter 5: Interpreting OLS Regression

  2. Regression as Double Compression • DoubleCompression.xls • EastNorthCentralFTWorkers.xls • [DoubleCompression]SATData • Math and Verbal scores • Note simple statistics summary and correlation • Conditional Mean E[Y|X] • Start with Scatterplot

  3. First Compression • Examine values of Y, over various small ranges of X • Recognize the ‘variance’ of Y in these strips • Create a conditional mean function • Average values of Y, given X (VerticalStrips) • Slide bar along X-axis • Move to Accordian • Examine 0, 2, 4, and Many intervals

  4. First Compression • Before Compressing: What is best guess at Math SAT unconditional on X? • After Compressing: What is best guess at Math SAT conditional on X? • Individual variation is hidden in “Graph of Averages” • Graphical equivalent of PivotTable (See AccordianPivot) • 500+ observations summarized with 35

  5. Second Compression • Linearizing graph of averages • ‘Regression line w/ observations & averages • Smooth linear version of plot of averages • Predicted Math SAT = 318.9 + 0.539 Verbal SAT • This equation, gives SIMILAR results to the previous compression • Now summarized with two numbers • Interpret as predicted Y given X

  6. Only an interpretation not a method • “Double Compression” of the data is an interpretation of regression, not how it is done • Method is either analytical or numerical (or using an algorithm) • In [EastNorthCentralFTWorkers]Regression • Given education, estimate annual earnings • PivotTable: E[earn|edu=12] = 27,933 • Regression: E[earn|edu=12] = 28,399 What else might be going on here???

  7. Regression and SD Line • [DoubleCompression]SATData • Examine two lines • SD Line: y = 94.6 + 0.978x • Reg Line: y = 318.94 + 0.539x • Notice slope of Regression line is shallower • Remember equation for slope in SIMPLE regression • Notice poor prediction with SD line • Calculate residuals

  8. Another Example (SD vs. OLS) • Go back to Reg.xls, and calculate SSR from both lines. (SDx = 4.18, SDy = 50.56, xbar = 7.5, ybar = 145.6 • OLS Line: y = 88.956 + 7.558x • Calculate SD Line • Compare SSR from SD and OLS lines • What does point of averages mean here?

  9. OLS Regression vs. SD • Simple Linear Regression (SLR) • Slope is SD line slope * correlation • Must have slope less than SD line

  10. Two Regression Lines • Open TwoRegressionLines.xls • In PivotTables what do you notice? • What happens in table TwoLines? • Do equations change when you switch axes? • Compare with SDLine • How do you phrase the different regression lines? • Do these lines have different meanings? • Can you just solve one regression line to find the other? • “Someone who is 89 points above the mean in Verbal (1 sd) is predicted to be 0.55 x 87 (r x SDmath) or 48 points above the mean Math score” (thus regress!)

  11. Two Regression Lines • Given a verbal score what is the best guess of a person’s math score? • Predicted Math SAT = 318 + 0.54 Verbal SAT • If Verbal = 600, Predicted Math SAT = 642 • Given a math score, what is the best guess of a person’s verbal score • Solve for verbal? From above • Predicted Verbal SAT = -589 + 1.85 Math SAT • NO!!! This is not correct! • Must regress verbal on math (verbal is predicted) • Predicted Verbal SAT = 176.1 + 0.564 Math SAT • If Math SAT = 642, you would predict Verbal SAT = 538

  12. Properties of Sample Average and Regression Line • Examine OLSFormula.xls, SampleAveIsOLS • Sample average is a weighted sum • Sample average is also the least squares estimator of central tendency • Here, weights sum to 1 • Examine Excel’s “Auditing Tool” • Average SSR is never greater than Median SSR • Sample average is least squares estimate of measure of central tendency

  13. Mean minimizes SSR (or OLS estimate) • Run Solver, minimize SSR by changing “Solver Estimate” starting at 100 • Note the sum of the residuals (F16) • Try with “Draw another dataset” in Live sheet • What happens to sum of residuals? • Try using the median

  14. [OLSFormula.xls]Example • Recall “weighted sum” calculation of slope coefficient • Regression goes through point of averages • Slope is weighted sum of Y’s • Weights bigger in absolute value the farther the x-value is from the average value of x • Weights sum to zero • Change in Y value has a predictable effect on OLS slope and intercept

  15. Residuals and Root Mean Squared Error (RMSE) (ResidualPlot.xls) • Residual Plots are “diagnostic tools” • There should be no discernable pattern in the plot • Residual plots can be done in Excel and SPSS • Try using LINEST method here to calculate the residuals. First, find the equation, then calculate predicted values, and then calculate residuals • Remember (Residual = Actual Y – Predicted Y) • Now, square the residuals, and find the average, and take the square root… • Root (of the) Mean (of the) Squared Errors • Measures your average mistake • Examine a scatterplot and histogram of the residuals

  16. RMSE is “like” the standard deviation of the residuals (but slightly different, see RMSE.xls for true difference) • General measure of dispersion • Also known as “Standard Error of the Regression”

  17. RMSE.xls • For many data sets, 68% of the observations fall within +/- 1 RMSE and 95% fall within +/- 2 RMSEs. When the RMSE is working as advertised, it should reflect these facts • Try changing spread in Computation • Examine Histograms in SATData and Accordian to understand that RMSE is the spread of the residuals • Pictures sheet shows residual plots and regressions

  18. RSquared.xls • Play the game to convey the idea of the improvement in prediction. • R2 measures the percentage improvement in prediction over just guessing average Y. • R2 ranges from 0 to 1. • R2 is a dangerous statistic because it is sometimes confused as measuring the quality of a regression. Notice that Excel’s Trendline offers R2 (and only this statistic) as an option. There is no single statistic that can be used to decide if a particular regression equation is good or bad.

  19. R2 Calculation • Total Sum of Squares (TSS or SST) • Sum of Squared Residuals (SSR) • Sum of Squares Explained (SSE)

  20. Ordinary Least Squares-Fit Note: This is the RATIO of what variation in y is EXPLAINED by the regression!

  21. Infant Mortality IMRGDPReg.xls • Start by moving data into SPSS • Plot data: What is relationship? • Save the residuals and plot against the independent variable • What and does this plot tell us?

  22. SameRegLineDifferentData.xls

  23. Real Data: HourlyEarnings.xls • Residuals: Why are data shown in “strips”? • Regress & Save residuals

  24. Regression Interpretation: Summary • Simplified Conditional Mean (more on this) • Intercept and Slope coefficient • Need theory to guide causation • BH call this “Two Regression Lines” • OLS is weighted average • RMSE and R2 are helpful

More Related