1 / 59

Introduction to Regression Lecture 5.1

Introduction to Regression Lecture 5.1. Review Transforming data, the log transform liver fluke egg hatching rate explaining CEO remuneration brain weights and body weights SLR with transformed data Transforming X, quadratic fit Other options. Using t values. Convention: n >30 is big,

conor
Download Presentation

Introduction to Regression Lecture 5.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to RegressionLecture 5.1 • Review • Transforming data, the log transform • liver fluke egg hatching rate • explaining CEO remuneration • brain weights and body weights • SLR with transformed data • Transforming X, quadratic fit • Other options Diploma in Statistics Introduction to Regression

  2. Using t values Convention: n >30 is big, n < 30 is small. Z0.05 = 1.96 ≈ 2 t30, 0.05 = 2.04 ≈ 2 Diploma in Statistics Introduction to Regression

  3. Diploma in Statistics Introduction to Regression

  4. Homework 4.2.1 Quantify the extent of the recovery in Year 6, Q3. = 1030 Q1 + 1292 Q2 + 1210 Q3 + 1279 Q4 + 33.7 Time Year 6 Q2: P = 1657 = 1292 + 33.7 × 22 = 2033 P – = 1657 – 2033 = – 376 Year 6 Q3: P = 2185 = 1210 + 33.7 × 23 = 1985 P – = 2185 – 1985 = 200 Diploma in Statistics Introduction to Regression

  5. Homework 4.2.2 List correspondences between the output from the original regression and the output from the alternative regression. Confirm that the coefficients of Q1, Q2 and Q3 in the original are the corresponding coefficients in the alternative with the Q4 coefficient added. Diploma in Statistics Introduction to Regression

  6. Predictor Coef SE Coef T P Noconstant Q1 1029.87 23.41 43.99 0.000 Q2 1292.35 24.45 52.85 0.000 Q3 1210.42 25.55 47.37 0.000 Q4 1278.70 26.71 47.88 0.000 Time 33.725 1.619 20.83 0.000 S = 40.9654 Predictor Coef SE Coef T P Constant 1278.70 26.71 47.88 0.000 Q1 -248.82 26.36 -9.44 0.000 Q2 13.65 26.11 0.52 0.609 Q3 -68.27 25.96 -2.63 0.019 Time 33.725 1.619 20.83 0.000 S = 40.9654 Diploma in Statistics Introduction to Regression

  7. Homework 4.2.3 • Calculate the simple linear regressions of Jobtime on each of T_Ops and Units. Confirm the corresponding t-values. • Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. • Confirm the calculation of the R2 values. Diploma in Statistics Introduction to Regression

  8. Solution 4.2.3 2. Calculate the simple linear regression of Jobtime on Ops per Unit. Comment on the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value. Comment: The t-value is insignificant; the negative correlation is just chance variation, with no substantive meaning. Diploma in Statistics Introduction to Regression

  9. Variance Inflation Factors Convention: problem if > 90% or VIFk > 10 Diploma in Statistics Introduction to Regression

  10. What to do? • Get new X values, to break correlation pattern • impractical in observational studies • Choose a subset of the X variables • manually • automatically • stepwise regression • other methods Diploma in Statistics Introduction to Regression

  11. Residential load survey data. Data collected by a US electricity supplier during an investigation of the factors that influence peak demand for electricity by residential customers. Load is demand at system peak demand hour, (kW) Size is house size, in SqFt/1000, Income (X2) is annual family income, in $/1000, AirCon (X3) is air conditioning capacity, in tons, Index (X4) is the house appliance index, in kW, Residents (X5) is number in house on a typical day Diploma in Statistics Introduction to Regression

  12. Matrix plot Diploma in Statistics Introduction to Regression

  13. Results All variables in: Predictor Coef SE Coef T P Constant 0.1263 0.2289 0.55 0.585 Size -2.6689 0.9059 -2.95 0.006 Income 0.00027912 0.00007892 3.54 0.001 AirCon 0.42462 0.03472 12.23 0.000 Index 0.00038137 0.00007884 4.84 0.000 Residents 0.00197 0.02218 0.09 0.930 Income deleted Predictor Coef SE Coef T P Constant -397.0 492.7 -0.81 0.426 Size 10943.3 594.2 18.42 0.000 AirCon -1.86 75.45 -0.02 0.980 Index 0.0721 0.1709 0.42 0.676 Residents 38.65 47.75 0.81 0.424 Diploma in Statistics Introduction to Regression

  14. Exercise Calculate the VIF for Size. Comment. Homework Calculate variance inflation factors for all explanatory variables. Discuss Diploma in Statistics Introduction to Regression

  15. Multicollinearity when when there is perfect correlation within the X variables. Example: Indicators Illustration: Minitab Diploma in Statistics Introduction to Regression

  16. Introduction to RegressionLecture 5.1 • Review • Transforming data, the log transform • liver fluke egg hatching rate • explaining CEO remuneration • brain weights and body weightsA • SLR with transformed data • Transforming X, quadratic fit • Other options Diploma in Statistics Introduction to Regression

  17. (i) Hatching of liver fluke eggs The life cycle of the liver fluke Diploma in Statistics Introduction to Regression

  18. Hatching of liver fluke eggs:Duration and Success rate Diploma in Statistics Introduction to Regression

  19. Diploma in Statistics Introduction to Regression

  20. Diploma in Statistics Introduction to Regression

  21. (ii) Explaining CEO Compensationand Company Sales,(Forbes magazine, May 1994) Diploma in Statistics Introduction to Regression

  22. Explaining CEO Remuneration,bivariate log transformation Diploma in Statistics Introduction to Regression

  23. (iii) Mammals' Brainweight vs Bodyweight Diploma in Statistics Introduction to Regression

  24. Scatterplot view Diploma in Statistics Introduction to Regression

  25. Scatterplot view,log transform Diploma in Statistics Introduction to Regression

  26. Scatterplot view,Dinosaurs deleted Diploma in Statistics Introduction to Regression

  27. Histogram view Diploma in Statistics Introduction to Regression

  28. Histogram view,log transform Diploma in Statistics Introduction to Regression

  29. Changing spread with log Diploma in Statistics Introduction to Regression

  30. Changing spread with log Diploma in Statistics Introduction to Regression

  31. Changing spread with log Diploma in Statistics Introduction to Regression

  32. Changing spread with log Diploma in Statistics Introduction to Regression

  33. Changing spread with log Diploma in Statistics Introduction to Regression

  34. Changing spread with log Diploma in Statistics Introduction to Regression

  35. Changing spread with log Diploma in Statistics Introduction to Regression

  36. Changing spread with log Diploma in Statistics Introduction to Regression

  37. Changing spread with log Diploma in Statistics Introduction to Regression

  38. Why the log transform works High spread at high X transformed to low spread at high Y Low spread at low X transformed to high spread at low Y Diploma in Statistics Introduction to Regression

  39. Why the log transform works 10 to 100 transformed to log10(10) to log10(102) i.e. 1 to 2 1/10 = 0.1 to 1/100 = 0.01 transformed to log10(10–1) to log10(10–2) i.e., – 1 to – 2 Diploma in Statistics Introduction to Regression

  40. Introduction to RegressionLecture 5.1 • Review • Transforming data, the log transform • liver fluke egg hatching rate • explaining CEO remuneration • brain weights and body weights • SLR with transformed data • Transforming X, quadratic fit • Other options Diploma in Statistics Introduction to Regression

  41. SLR with transformed dataLBrainW versus LBodyW The regression equation is LBrainW = 0.932 + 0.753 LBodyW Predictor Coef SE Coef T P Constant 0.93237 0.04170 22.36 0.000 LBodyW 0.75309 0.02858 26.35 0.000 S = 0.302949 Diploma in Statistics Introduction to Regression

  42. Application:Do humans conform? Human Diploma in Statistics Introduction to Regression

  43. Application:Do humans conform? • Delete the Human data, • calculate regression, • predict human LBrainW and • compare to actual, relative to s Diploma in Statistics Introduction to Regression

  44. Application:Do humans conform? Regression Analysis: LBrainW versus LBodyW The regression equation is LBrainW = 0.924 + 0.744 LBodyW Predictor Coef SE Coef t p Constant 0.92410 0.03933 23.50 0.000 LBodyW 0.74383 0.02706 27.48 0.000 S = 0.285036 Diploma in Statistics Introduction to Regression

  45. Application:Do humans conform? LBodyW(Human) = 1.79239 LBrainW(Human) = 3.12057 Predicted LBrainW = 0.924 + 0.744 × 1.79239 = 2.25754 Residual = 3.12057 – 2.25754 = 0.86303 Residual / s = 0.86303 / 0.285036 = 3.03 Diploma in Statistics Introduction to Regression

  46. Deleted residuals For each potentially exceptional case: • delete the case • calculate the regression from the rest • use the fitted equation to calculate a deleted fitted value • calculate deleted residual = obseved value– deleted fitted value Minitab does this automatically for all cases! Diploma in Statistics Introduction to Regression

  47. Application:Do humans conform? With 63 cases, we do not expect to see any cases with residuals exceeding 3 standard deviations. On the other hand, recalling the scatter plot, the humans do not appear particulary exceptional. The dotplot view of deleted residuals emphasises this: Water opossums appear more exceptional. Water Opossum Human Diploma in Statistics Introduction to Regression

  48. Application:Do humans conform? Diploma in Statistics Introduction to Regression

  49. Introduction to RegressionLecture 5.1 • Review • Transforming data, the log transform • liver fluke egg hatching rate • explaining CEO remuneration • brain weights and body weights • SLR with transformed data • Transforming X, quadratic fit • Other options Diploma in Statistics Introduction to Regression

  50. Optimising a nicotine extraction process In determining the quantity of nicotine in different samples of tobacco, temperature is a key variable in optimising the extraction process. A study of this phenomenon involving analysis of 18 samples produced these data. Diploma in Statistics Introduction to Regression

More Related