1 / 38

Basics of Regression Analysis

Basics of Regression Analysis. Determination of three performance measures. Estimation of the effect of each factor Explanation of the variability Forecasting Error. Two Predictor Variables. Population Regression Model:

minda
Download Presentation

Basics of Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basics of Regression Analysis

  2. Determination of three performance measures • Estimation of the effect of each factor • Explanation of the variability • Forecasting Error

  3. Two Predictor Variables Population Regression Model: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) Unknown parameters: b0, b1, b2; s

  4. From Data to Estimates of Coefficients Principle: Least Squares Normal Equation Systems Estimates of Coefficients Mathematics Computing Algorithm

  5. Least Squares Method Simple Regression Multiple Regression

  6. Matrix Computation for b • Normal Equation System: (XTX) b = XTY • See Text Appendix D.3 • Solution for b: b = (XTX)-1 (XTY)

  7. Standardized Regression Coefficients, • Definition • b0 = 0 • the beta coefficient • Used to show relative weights of predictors. for k = 1, 2

  8. 2 SSE = Y - Y i i SSE MSE = se 2 = (n-3) Estimation of s, se - Standard Deviation of Disturbancee • Forecasting Equation • SS of Residuals • Mean SS n S i=1

  9. Standard Error of Coefficients • The variance matrix of b (K+1 x 1)is

  10. The Variability Explained • First, determine the base variability for explanation by the regression Unconditional mean model: Y =my + e e followsN(0, sy) LS fit of the model: Pred_Y = Y SS of Residuals: MSS (DF=n-1):

  11. The Variability Explained – cont. • Second, by subtraction of the variability for still left. • In SS: • In Variance :

  12. Creating ANOVA Table

  13. Test of Significance • F test of significance • T- Test of significance • Two sided alternative • One sided alternative

  14. F - Test of Significance of the variability explained by the regression H0: b1= b2 = 0 Ha: At least one coefficient is not 0 P-Value of F-stat = P{F(2, n-3)> F-stat}

  15. t-Test of Significanceof significance of a variable, X1 - two sided H0: b1 = 0Ha: b1 = 0 P-Value of t-stat = P{ t( n-3)> |t-stat|}

  16. One Sided Test of Significanceof significance of a variable, X1 H0: b1 = 0Ha: b1 > 0 (using the prior knowledge) p-Value of t-stat = P{ t( n-3)> t-stat}

  17. Forecasting • Point forecasting • Sources of forecasting error • Interval forecasting

  18. Forecasting at xm Data of X for regression Value of X for prediction

  19. Sources of Forecasting Error • Data: Y|xm = b0+ b1 x1m + b2 x2m + em • Forecast: • Forecast Error:

  20. Computing Standard Errors

  21. Forecasting Performance Analysis • R2_pred = 1 – Press / SST Press = SS of {yi – yi(i)} (deleted residual) • Sample splitting • Analysis sample (n1) • Validation sample (n2)

  22. Generalization to K Independent Variables • Use n – K – 1 for n – 3 for DF for t. • Use K for the numerator DF and n-K-1 for the denominator DF for F.

  23. Diagnostics • Assumptions for Disturbance • Multi-collinearity • Outliers and Influential Observations

  24. Problematic Data Conditions • Regression Coefficients Are Sensitive to: • Highly Collinear Independent Variables • Contamination By Outliers and Influential Observations

  25. DetectingOutliers and Influential Data • Outliers • Leverage (X-space) distance from the mean • Tresid (Y-space) forecasting error • Influential Data • Idea: with / without comparison • Cook’D • Dfbetas • Dfits

  26. Modeling Techniques • Transformation of Variables • Log • Others • Using Dummy Variables • Symbolic representation • Dummy variables for qualitative variables • Using Scores for Ordinal Variables • Selection of Independent Variables • Forecasting • Computer intensive • Analysis of correlation structure of independent variables

  27. Dummy Variables • DK= “If (X=k,1,0)” • Can be used nominal and also ordinal variables • # of DK = c-1 where c is the number of categories.

  28. Using Scores for Ordinal Variable • Scoring Systems • 1, 2, 3, …c • -2, -1, 0, 1, 2 c:odd

  29. Implications of Variable Selection

  30. Selection of Variables - 1 • Backward elimination • Stepwise (forward) inclusion T-test All X’s Final Regression Best simple Best Two variables Best …. variables Max Increase in R2 Max Increase in R2

  31. Selection of Variables - 2 • All Possible Regression K simple K (K-1) two variable K independent variables Final Regression 1 K variable

  32. Selection Criteria • R2___________________________ • Adj. R2 ______________________ • R2PRED ______________________ • Se __________________________ • Cp___________________________

  33. Cp(= # of coefficients) Select a combination with Cp close to p

  34. What to Look for in Good Regression? • Remember the three functions of regression • Estimation of the effect of each X • Explaining the variability of Y • Forecasting • Populations regressions are assumptions • Needs testing • Data might be contaminated

  35. ExtensionsFor Other Variable Types of Y

  36. Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal

  37. Generalized Linear Models (GLM) • Regression model: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) • GLM Formulation: • Model for Y: Y is N(m, s) • Model for predictors (Link Function): m = b0 + b1X1 + b2X

  38. Forecasting Counting Data • Model for Y: Poisson Distribution (m) • Link Function:

More Related