400 likes | 924 Views
Polynomial regression models. Possible models for when the response function is “curved”. Uses of polynomial models. When the true response function really is a polynomial function.
E N D
Polynomial regression models Possible models for when the response function is “curved”
Uses of polynomial models • When the true response function really is a polynomial function. • (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.
Example • What is impact of exercise on human immune system? • Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?
A quadratic polynomial regression function • where: • Yi = amount of immunoglobin in blood (mg) • Xi = maximal oxygen uptake (ml/kg) • typical assumptions about error terms (“INE”)
Interpretation of the regression coefficients • If 0 is a possible x value, then b0 is the predicted response. Otherwise, interpretation of b0 is meaningless. • b1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. • b2 indicates the up/down direction of curve • b2 < 0 means curve is concave down • b2 > 0 means curve is concave up
The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq Predictor Coef SE Coef T P VIF Constant -1464.4 411.4 -3.56 0.001 oxygen 88.31 16.47 5.36 0.000 99.9 oxygensq -0.5362 0.1582 -3.39 0.002 99.9 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxygen 1 4472047 oxygensq 1 130164
A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995
“Center” the predictors Mean of oxygen = 50.637 oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064
Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219
where denotes the centered predictor, and A better quadratic polynomial regression function β*0 = mean response at the predictor mean β*1 = “linear effect coefficient” β*11 = “quadratic effect coefficient”
The regression equation is igg = 1632 + 34.0 oxcent - 0.536 oxcentsq Predictor Coef SE Coef T P VIF Constant 1632.20 29.35 55.61 0.000 oxcent 34.000 1.689 20.13 0.000 1.1 oxcentsq -0.5362 0.1582 -3.39 0.002 1.1 S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression 2 4602211 2301105 203.16 0.000 Residual Error 27 305818 11327 Total 29 4908029 Source DF Seq SS oxcent 1 4472047 oxcentsq 1 130164
Interpretation of the regression coefficients • b0 is predicted response at the predictor mean. • b1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. • b2 indicates the up/down direction of curve • b2 < 0 means curve is concave down • b2 > 0 means curve is concave up
The relationship between the two forms of the model Original model: Centered model: Where:
What is predicted IgG if maximal oxygen uptake is 90? Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq 1 39.4 1549 There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction.
It is possible to “overfit” the data with polynomial models.
It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial oforder n-1 will fit the data perfectly, that is, it will pass through each data point. But, good statistical software will keep an unsuspecting user from fitting such a model. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted
The hierarchical approachto model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?
and not this one: The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function:
Example • Quality of a product (y) – a score between 0 and 100 • Temperature (x1) – degrees Fahrenheit • Pressure (x2) – pounds per square inch
A two-predictor, second-order polynomial regression function • where: • Yi = quality • Xi1 = temperature • Xi2 = pressure • β12 = “interaction effect coefficient”
The regression equation is quality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp Predictor Coef SE Coef T P VIF Constant -5127.9 110.3 -46.49 0.000 temp 31.096 1.344 23.13 0.000 1154.5 pressure 139.747 3.140 44.50 0.000 1574.5 tempsq -0.133389 0.006853 -19.46 0.000 973.0 Press -1.14422 0.02741 -41.74 0.000 1453.0 tp -0.145500 0.009692 -15.01 0.000 304.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%
Again, some correlation quality temp pressure tempsq presssq temp -0.423 pressure 0.182 0.000 tempsq -0.434 0.999 0.000 presssq 0.162 0.000 1.000 -0.000 tp -0.227 0.7730.6320.7720.632 Cell Contents: Pearson correlation
A better two-predictor, second-order polynomial regression function • where: • Yi = quality • xi1 = centered temperature • xi2 = centered pressure • β*12 = “interaction effect coefficient”
Reduced correlation quality tcent pcent tpcent tcentsq tcent -0.423 pcent 0.182 0.000 tpcent -0.274 0.000 0.000 tcentsq -0.355 -0.000 0.000 0.000 pcentsq -0.762 0.000 0.000 0.000 -0.000 Cell Contents: Pearson correlation
The regression equation is quality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq Predictor Coef SE Coef T P VIF Constant 94.9259 0.7224 131.40 0.000 tcent -0.91611 0.03957 -23.15 0.000 1.0 pcent 0.78778 0.07913 9.95 0.000 1.0 tpcent -0.145500 0.009692 -15.01 0.000 1.0 tcentsq -0.133389 0.006853 -19.46 0.000 1.0 pcentsq -1.14422 0.02741 -41.74 0.000 1.0 S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%
Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 94.926 0.722 (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq 1 0.0000 0.0000 0.0000 0.0000 0.0000