1 / 60

Scatterplots & Regression

Scatterplots & Regression. Week 3 Lecture MG461 Dr. Meredith Rolfe. Key Goals of the Week. What is regression? When is regression used? Formal statement of linear model equation Identify components of linear model Interpret regression results:

jeanne
Download Presentation

Scatterplots & Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe

  2. Key Goals of the Week • What is regression? • When is regression used? • Formal statement of linear model equation • Identify components of linear model • Interpret regression results: • decomposition of variance and goodness of fit • estimated regression coefficients • significance tests for coefficients MG461, Week 3 Seminar

  3. Who has studied regression or linear models before? Taken before Not taken before

  4. Which group are you in? Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Which group are you in?

  5. Regression is a set of statistical tools to model the conditional expectation… of one variable on another variable. of one variable on one or more other variables.

  6. Linear Model Background MG461, Week 3 Seminar

  7. Recap: Theoretical System Y X MG461, Week 3 Seminar

  8. What is regression? Regression is the study of relationships between variables It provides a framework for testing models of relationships between variables Regression techniques are used to assess the extent to which the outcome variable of interest, Y, changes dependent on changes in the independent variable(s), X MG461, Week 3 Seminar

  9. Conditional Dependence: Correct Answer vs. Prior Exposure

  10. When to use Regression • We want to know whether the outcome, y, varies depending on x • We can use regression to study correlation (not causation) or make predictions • Continuous variables (but many exceptions) MG461, Week 3 Seminar

  11. Questions we might care about Does an change in X lead to an change in Y Do higher paid employees contribute more to organizational success? Do large companies earn more? Do they have lower tax rates? MG461, Week 3 Seminar

  12. How to answer the questions? Observational (Field) Study Experimental Study Random assignment to different contribution levels or levels of pay (?) ? Random assignment to country and/or tax rate, quasi-experiment? • Collect data on income and a measure of contributions to the organization • Collect data on corporate profits in various regions (which companies, which regions) MG461, Week 3 Seminar

  13. When to use Regression • We want to know whether the outcome, y, varies depending on x • Continuous variables (but many exceptions) • Observational data (mostly) MG461, Week 3 Seminar

  14. Example 1: Pay and Performance Pay Performance Runs Yearly Salary Y X MG461, Week 3 Seminar

  15. Scatterplot: Salaries vs. Runs Y X MG461, Week 3 Seminar

  16. Scatterplot: Salaries vs. Runs Δy Δx MG461, Week 3 Seminar

  17. What is the equation for a line? What is the equation for a line? y=ax2 y=ax y=ax+b y=x+b

  18. Equation of a (Regression) Line Population Parameters Intercept Slope But… x and y are random variables, we need an equation that accounts for noise and signal MG461, Week 3 Seminar

  19. SimpleLinear Model Observation or data point, i, goes from 1…n Error Intercept Coefficient (Slope) Dependent Variable Independent Variable MG461, Week 3 Seminar

  20. The relationship must be LINEAR • The linear model assumes a LINEAR relationship • You can get results even if the relationship is not linear • LOOK at the data! • Check for linearity MG461, Week 3 Seminar

  21. When to use Regression We want to know whether the outcome, y, varies depending on x Continuous variables (but many exceptions) Observational data (mostly) The relationship between x and y is linear MG461, Week 3 Seminar

  22. Understanding the key points What is regression? When is regression used? Formal statement of linear model equation MG461, Week 3 Seminar

  23. Understand what regression is… Strongly Agree Agree Disagree Strongly Disagree Mean = Median =

  24. Understand when to use regression Strongly Agree Agree Disagree Strongly Disagree Mean = Median =

  25. Know how to make formal statement of linear model Strongly Agree Agree Disagree Strongly Disagree Mean = Median =

  26. Discussion What is regression? When is regression used? Formal statement of linear model equation MG461, Week 3 Seminar 26

  27. Model Estimation MG461, Week 3 Seminar

  28. Which model parameter do we NOT need to estimate? β0 β1 xi σ2

  29. Goal: Estimate the Relationship between X and Y ("beta hat one") ("beta hat zero") • We can also estimate the error variance σ2 as Estimate the population parameters β0 and β1 MG461, Week 3 Seminar

  30. What would “good” estimates do? Minimize explained variance Minimize distance to outliers Minimize unexplained variance

  31. Finding the Best Line: Unexplained Variance ei MG461, Week 3 Seminar

  32. Ordinary Least Squares (OLS) Criteria ("beta hat one") ("beta hat zero") Minimize “noise” (unexplained variance) defined as residual sum of squares (RSS) MG461, Week 3 Seminar

  33. OLS Estimates of Beta-hat MG461, Week 3 Seminar

  34. Mean of x and y MG461, Week 3 Seminar

  35. Variance of x MG461, Week 3 Seminar

  36. Variance of y MG461, Week 3 Seminar

  37. Covariance of x and y MG461, Week 3 Seminar

  38. OLS Estimates of Beta-hat Note the similarity between ß1 and the slope of a line: change in y over change in x (rise over run) MG461, Week 3 Seminar

  39. Why squared residuals? Y X Geometric intuition: X and Y are vectors, find the shortest line between them: MG461, Week 3 Seminar

  40. Decomposing the Variance • As in Anova, we now have: • Explained Variation • Unexplained Variation • Total Variation • This decomposition of variance provides one way to think about how well the estimated model fits the data MG461, Week 3 Seminar

  41. Total Squared Residuals (SYY) MG461, Week 3 Seminar

  42. Explained vs. Unexplained Squared Residuals MG461, Week 3 Seminar

  43. R2 and goodness of fit MG461, Week 3 Seminar RSS (residual sum of squares) = unexplained variation TSS (total sum of squares) = SYY, total variation of dependent variable y ESS (explained sum of squares) = explained variation (TSS-RSS)

  44. Examples of high and low R2 Graph 1 Graph 2 MG461, Week 3 Seminar

  45. Which graph had a high R2? Graph 1 Graph 2

  46. Recall that R2= ESS/TSS or 1-(RSS/TSS). What values can R2 take on? Can be any number Any number between -1 and 1 Any number between 0 and 1 Any number between 1 and 100

  47. Examples of high and low R2 R2=0.29 R2=0.87 MG461, Week 3 Seminar

  48. Interpretation of ß-hats ß-hat0 : intercept, value of yi when xi is 0 ß-hat1: average or expected change in yi for every 1 unit change in xi MG461, Week 3 Seminar

  49. Visualization of Coefficients Δy=β1 Δx=1 β0 MG461, Week 3 Seminar

  50. OLS estimates: Pay for Runs Salaryi = Beta-hat0 + Beta-hat1 * Runsi + errori MG461, Week 3 Seminar

More Related