1 / 22

Review Session

Review Session. Linear Regression. Correlation. Pearson’s r Measures the strength and type of a relationship between the x and y variables Ranges from -1 to +1. Correlation printout in Minitab. Top number is the correlation Bottom number is the p-value. Simple Linear Regression.

bo-hartman
Download Presentation

Review Session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review Session Linear Regression

  2. Correlation • Pearson’s r • Measures the strength and type of a relationship between the x and y variables • Ranges from -1 to +1

  3. Correlation printout in Minitab • Top number is thecorrelation • Bottom number is thep-value

  4. Simple Linear Regression y=b0 + b1x1 + e

  5. Simple Linear RegressionMaking A Point Prediction y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) For a person with a GMAT Score of 400, what is the expected 1st year GPA? GPA = 1.47 + 0.00323(GMAT) GPA = 1.47 + 0.00323(400) GPA = 1.47 + 1.292 GPA = 2.76

  6. Simple Linear Regression y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) What’s the 95% CI for the GPA of a person with a GMAT score of 400? GPA = 2.76 SE = 0.26 2.76 +/- 2(0.26) 95% CI = (2.24, 3.28)

  7. Coefficient CI’s and Testing y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) Find the 95% CI for the coefficients. b0 = 1.47 +/- 2(0.22) = 1.47 +/- 0.44 = (1.03, 1.91) b1 = 0.0032 +/- 2(0.0004) = 0.0032 +/- 0.0008 = 0.0026, 0.0040

  8. Coefficient Testing y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) The p-value for each coefficient is the result of a hypothesis test H0: b = 0 H1: b <> 0 If p-value <= 0.05, reject H0 and accept the coefficient.

  9. R2 • r2 and R2 • Square of Pearson's r • Little r2 is for simple regression • Big R2 is used for multiple regression

  10. Sample R2 values R2 = 0.80 R2 = 0.60 R2 = 0.30 R2 = 0.20

  11. Regression ANOVA • H0: b1 = b2 = …. = bk = 0 • Ha: at least one b <> 0 • F-statistic, df1, df2  p-value • If p <= 0.05, at least one of the b’s is not zero • If p > 0.05, it’s possible that all of the b’s are zero

  12. Diagnostics - Residuals • Residuals = errors • Residuals should be normally distributed • Residuals should have a constant variance • Heteroscedasticity: pattern in the residual distribution • Autocorrelation: error magnitude increases or decreases with the magnitude of an independent variable • Heteroscedasticity and autocorrelation indicate problems with the model • Homoscedasticity: no pattern in the residual distribution • Use the 4-in-one plot for these diagnostics

  13. Adding a Power Transformation • Each “bump” or “U” shape in a scatter plot indicates that an additional power may be involved. • 0 bumps: x • 1 bump: x2 • 2 bumps: x3 • Standard equation is y = b0 + b1x+ b2x2 • Don’t forget: Check to see if b1 and b2 are statistically significant, and that the model is also statistically significant.

  14. Categorical Variables • Occasionally it is necessary to add a categorical variable to a regression model. • Suppose that we have a car dealership, and we want to model the sale price based on the time on the lot and the sales person (Tom, Dick, or Harry). • The time on the lot is a linear variable. • Salesperson is a categorical variable.

  15. Categorical Variables • Categorical variables are modeled in regression using Boolean logic Example: y = b0 + btimextime + bTomxTom + bDickxDick

  16. Categorical Variables Harry is the baseline category for the model Tom and Dick’s performance will be gauged in relation to Harry, but not each other. Example: y = b0 + btimextime + bTomxTom + bDickxDick

  17. Categorical Variables y = b0 + btimextime + bTomxTom + bDickxDick • Interpretation • Tom’s average sale price is bTom more than Harry’s • Dick’s average sale price is bDick more than Harry’s

  18. Multicolinearity • Multicolinearity: Predictor variables are correlated with each other. • Multicolinearity results in instability in the estimation of the b’s • P-values will be larger • Confidence in the b’s decreases or disappears (magnitude and sign may be different from the expected values) • A small change in the data results in large variations in the coefficients • Read 11.11

  19. VIF-Variance Inflation Factor • Measures the degree to which the confidence in the estimate of the coefficient is decreased by multicolinearity. • The larger the VIF, the greater a problem multicolinearity is. • If VIF > 10 then there may be a problem • If VIF >=15 then there may be a serious problem

  20. Model Selection • Start with everything. • Delete variables with high VIF factors one at a time. • Delete variables one at a time, deleting the one with the largest p-value. • Stop when all p-values are less than 0.05.

  21. Demand Price Curve The demand-price function is nonlinear: D=b0Pb1 A log transformation makes it linear: ln(D)=ln(b0) +b1ln(P) Run the Regression on the transformed variables Plug the coefficients into the equation below: D=eb0Pb1 Make your projections on this last equation.

  22. Demand Price Curve • Create a variable for the natural log of demand and the natural log of the independent variables. • In Excel : =ln(demand), =ln(price), =ln(income), etc. • Run the regression on the transformed variables. • Place the coefficients in the equation: d=econstantpb1ib2 • Simplify to: d=kpb1ib2 (Note that econstant=k) • If income is not included, then the equation is just: d=kpb1 The demand-price function is nonlinear: d=kpb1 A log transformation makes it linear: ln(d)=b0 +bpln(p)

More Related