1 / 36

Polynomial regression models

Polynomial regression models. Possible models for when the response function is “curved”. Uses of polynomial models. When the true response function really is a polynomial function.

stormy
Download Presentation

Polynomial regression models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Polynomial regression models Possible models for when the response function is “curved”

  2. Uses of polynomial models • When the true response function really is a polynomial function. • (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

  3. Example • What is impact of exercise on human immune system? • Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

  4. Scatter plot

  5. A quadratic polynomial regression function • where: • Yi = amount of immunoglobin in blood (mg) • Xi = maximal oxygen uptake (ml/kg) • typical assumptions about error terms (“INE”)

  6. Estimated quadratic function

  7. Interpretation of the regression coefficients • If 0 is a possible x value, then b0 is the predicted response. Otherwise, interpretation of b0 is meaningless. • b1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. • b2 indicates the up/down direction of curve • b2 < 0 means curve is concave down • b2 > 0 means curve is concave up

  8. The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq Predictor Coef SE Coef T P VIF Constant -1464.4 411.4 -3.56 0.001 oxygen 88.31 16.47 5.36 0.000 99.9 oxygensq -0.5362 0.1582 -3.39 0.002 99.9 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxygen 1 4472047 oxygensq 1 130164

  9. A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995

  10. “Center” the predictors Mean of oxygen = 50.637 oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064

  11. Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219

  12. where denotes the centered predictor, and A better quadratic polynomial regression function β*0 = mean response at the predictor mean β*1 = “linear effect coefficient” β*11 = “quadratic effect coefficient”

  13. The regression equation is igg = 1632 + 34.0 oxcent - 0.536 oxcentsq Predictor Coef SE Coef T P VIF Constant 1632.20 29.35 55.61 0.000 oxcent 34.000 1.689 20.13 0.000 1.1 oxcentsq -0.5362 0.1582 -3.39 0.002 1.1 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxcent 1 4472047 oxcentsq 1 130164

  14. Interpretation of the regression coefficients • b0 is predicted response at the predictor mean. • b1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. • b2 indicates the up/down direction of curve • b2 < 0 means curve is concave down • b2 > 0 means curve is concave up

  15. Estimated regression function

  16. Similar estimates

  17. The relationship between the two forms of the model Original model: Centered model: Where:

  18. Mean of oxygen = 50.637

  19. What is predicted IgG if maximal oxygen uptake is 90? Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq 1 39.4 1549 There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction.

  20. It is possible to “overfit” the data with polynomial models.

  21. It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial oforder n-1 will fit the data perfectly, that is, it will pass through each data point. But, good statistical software will keep an unsuspecting user from fitting such a model. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted

  22. The hierarchical approachto model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?

  23. and not this one: The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function:

  24. Example • Quality of a product (y) – a score between 0 and 100 • Temperature (x1) – degrees Fahrenheit • Pressure (x2) – pounds per square inch

  25. A two-predictor, second-order polynomial regression function • where: • Yi = quality • Xi1 = temperature • Xi2 = pressure • β12 = “interaction effect coefficient”

  26. The regression equation is quality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp Predictor Coef SE Coef T P VIF Constant -5127.9 110.3 -46.49 0.000 temp 31.096 1.344 23.13 0.000 1154.5 pressure 139.747 3.140 44.50 0.000 1574.5 tempsq -0.133389 0.006853 -19.46 0.000 973.0 Press -1.14422 0.02741 -41.74 0.000 1453.0 tp -0.145500 0.009692 -15.01 0.000 304.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

  27. Again, some correlation quality temp pressure tempsq presssq temp -0.423 pressure 0.182 0.000 tempsq -0.434 0.999 0.000 presssq 0.162 0.000 1.000 -0.000 tp -0.227 0.7730.6320.7720.632 Cell Contents: Pearson correlation

  28. A better two-predictor, second-order polynomial regression function • where: • Yi = quality • xi1 = centered temperature • xi2 = centered pressure • β*12 = “interaction effect coefficient”

  29. Reduced correlation quality tcent pcent tpcent tcentsq tcent -0.423 pcent 0.182 0.000 tpcent -0.274 0.000 0.000 tcentsq -0.355 -0.000 0.000 0.000 pcentsq -0.762 0.000 0.000 0.000 -0.000 Cell Contents: Pearson correlation

  30. The regression equation is quality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq Predictor Coef SE Coef T P VIF Constant 94.9259 0.7224 131.40 0.000 tcent -0.91611 0.03957 -23.15 0.000 1.0 pcent 0.78778 0.07913 9.95 0.000 1.0 tpcent -0.145500 0.009692 -15.01 0.000 1.0 tcentsq -0.133389 0.006853 -19.46 0.000 1.0 pcentsq -1.14422 0.02741 -41.74 0.000 1.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

  31. Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 94.926 0.722 (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq 1 0.0000 0.0000 0.0000 0.0000 0.0000

More Related