1 / 24

Regression Analysis Project

Regression Analysis Project. Michael heidneR. Table of Contents. I. Introduction II. Scatter Plots – Relationship Analysis III. Correlation Coefficients Analysis IV. List of all Sub-Models V. Analysis of Sub-Models VI. Analysis to determine Best Model VII. Summary. Introduction.

auryon
Download Presentation

Regression Analysis Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Analysis Project Michael heidneR

  2. Table of Contents I. Introduction II. Scatter Plots – Relationship Analysis III. Correlation Coefficients Analysis IV. List of all Sub-Models V. Analysis of Sub-Models VI. Analysis to determine Best Model VII. Summary

  3. Introduction • Determine relationship between the income of the head of household and five variables listed below • Educational Level • Length of Current Employment • Age of the Head of Household • Household Size (family members) • Size of Residence • Dependent Variable – Income of the head of household • Independent Variable – All other variables

  4. Income vs. Years of Education - Strong Relationship -Higher income relates to more years of education

  5. Income vs. Years Employed - No Relationship

  6. Income vs. Age -Two lines -Upward Sloping -Flat -No Relationship

  7. Income vs. Family Size - Data difficult to analyze - No Relationship

  8. Income vs. Residence Size -Strong Relationship -Higher the income relates to larger residence size

  9. Independent Variable Scatter Plots -1 plot showed strong relationship -Years of Education vs. Residence Size -Relates to what was found in dependent variable scatter plots

  10. Correlation Coefficients Analysis • Analysis of the correlation coefficients will determine how well the regression equation represents the set of data Analysis of Scatter Plots • Measurement will be obtained through the utilization of the coefficient of determination (r 2) for each set of data • r 2 value of greater than ± 0.8 indicates a strong relationship • r 2 value of less than ± 0.5 indicates a weak relationship • r2value of 1 indicates a perfect relationship • The plotting of the residuals (difference from the actual data points and those predicted by the regression equation) allows for a visual interpetation of the r 2

  11. Coefficient of Determination -To the right is an example of the relationship between income and years of education -The r 2 value indicates that 83% of the total variation in income can be explained by the regression line - This is considered a strong relationship

  12. Coefficient of Determination -To the right is an example of the scatter plot with an overlay of the linear regression line -The small number of outliers and the close proximity of the points to the line allows for visual interpretation of the r 2 value - Note: the slope indicates that income increases by over $11,356 per education year.

  13. Coefficient of Determination -To the right is an example of the residual plot -As the data points are somewhat evenly dispersed, the linear regression model is appropriate to use

  14. Coefficient of Determination -To the right is a summary of the r 2values -The first three (enclosed in box) all have strong relationships that were on Scatter Plots on earlier slides

  15. Correlation Coefficients Analysis Multiple Regression Analysis • First, the t Test for the Slope will be utilized to determine if there is a significant linear relationship between the X and Y variables • Each independent variable will be evaluated to see if there’s a relationship with the income of the head of the household • H0: B1 = 0. If the null hypothesis is rejected, there is evidence of a linear relationship • Second, the Confidence interval will also allow for the estimate of the value of the population slope of the dependent variables

  16. t Test for Slope -To the right is a regression analysis table from PhStat2 that will allow for the t Test for Slope to be evaluated -As the P-value is less than the 0.05 significance level for Education Years and Residence Size, the null hypothesis can be rejected

  17. Confidence Interval -An example of the Confidence Interval is to the right -It states that estimated effect of one additional year of education increases the income of the head of house by between $1,387 and $5,148

  18. List of all Sub-Models -31 total sub-models -Uses up to five variables in the regression equation.

  19. Model 21: Y = -36515.5583+ 3957.2436X1 + 0.0116X22 - 0.3690X1X2 • Model 22: Y = 9132.9370 – 20.9153X2 - 140.8232X12 + 0.0140X22 • Model 23: Y = 38739.8128 - 44.7582X2 - 277.0885X12 + 5.6623X1X2 • Model 24:Y = -25759.8750 - 34.8093X12 + 0.0112X22 + 1.7767X1X2 Analysis of Sub-Models 21, 22, 23, 24 -Model 21: r2 of 0.9274 EduyearsXResidence size had p-value of 0.5349 > 0.05 -Model 22: r2 of 0.9395. Intercept p-value indicates no correlation at 0.1691. -Model 23: r2 of 0.9133, one of the weaker adjusted r2 of 0.9106 meaning models explains 91.06% of the variability in the data. -Model 24: r2 = 0.9357. t-test shows signifiance, but does not fit best with Cp statistic measurement.

  20. Model 25: Y =-13282.0474 + 270.4987X12 + 0.0155X22 - 1.7827X1X2 • Model 26: Y = 69749.5689 - 9034.0557X1 -19.6461X2 + 465.1221X12 + 0.0131X22 • Model 27: Y = 73359.26 – 6084.49X1 – 38.26X2 – 17.55 X12 + 5.12X1X2 • Model 28:Y = 26390.86 - 57.80X1 – 35X2 + 0.011X22 + 1.80X1X2 Analysis of Sub-Models 25, 26, 27, 28 -Model 25: r2 = 0.9372. Model parameters do not fit scatter-plots even though indicating signficant. -Model 26: 2ndhighest of all adjusted r2 at 0.9425. All model parameters significant. -Model 27: r2 dipped to 0.9155. Lacks overall significance is that that 2 out of the 4 p-values are greater than α at 0.05 -Model 28: r2 of 0.9377. High p-value and t-statistic in the years of education at 0.9721> significance level of 0.05

  21. Model 29: Y = 64973.70 – 11516.70X1 + 716.08X12 + 0.016X22 – 2.17X1X2 • Model 30: Y = 8586.566 – 20.43X2 + 145.32X12 + 0.014X22 – 0.06X1X2 • Model 31: Y = 67121.22 – 10829.91X1 – 6.37X2 + 650.499X12 + 0.0155X22 – 1.61X1X2 Analysis of Sub-Models 29, 30, 31 -Model 29: Highest adjsutedr2 of 0.9493. P-values would fit 99% confidence interval -Model 30: r2 = 0.9369. Three out of the four variables — residential size, educational years squared and education years X residential size — p-values >0.05 = no correlation -Model 31: r2= 0.9495. Lower F than previous models at 328.82. High influx in p-values at residential size and education yearsXresidentialsize.

  22. Model 26: • Model 29: Analysis to Determine Best Model -Overall regression and model parameters point to Model 26 and Model 29 being “best fits” -Model 29’s adjusted r2 is 0.9434 compared to Model 26’s adjusted R2 at 0.9425. Model 29 barely edges here as it explains 94.34% of the variability in the data. -Overall regression, the F value is higher in Model 29 at 413.90 compared to Model 30, which is at 406.56. Both of these greatly above the critical value of F (df = 4,95 = fcritical= 2.467), but Model 29 is substantially higher meaning this Model is more significant. -Significance F are both 0.000. -The P-values of both Models would fit a 99% confidence interval, making α = 0.01. In relation of that all variables in both Models have a t-statistic that falls into the non-critical level between +/- 1.98.

  23. Analysis to Determine Best Model Cp statistic measures the differences between a fitted regression model and a true model with random error - Further determine the best sub-model by analyzing the Cpstatistic - Looking for Model that Cp equal or less than k+1 Model 26 (X2X3X4): Cp = 5.9318 Model 29 (X3X4X5): Cp = 4.3220 Conclusion: MODEL 29 BEST FIT

  24. Model 29: Y = 64973.70 – 11516.70X1 + 716.08X12 + 0.016X22 – 2.17X1X2 Summary - Model 29 is less volatile in its model parameters and could fit a 99.9% confidence interval in the variables. -In line with Cp statistic for determining best sub-set. -Model has the most significant regression equation that would calculate the most accurate output.

More Related