220 likes | 676 Views
Multiple and complex regression. Extensions of simple linear regression. Multiple regression models: predictor variables are continuous Analysis of variance: predictor variables are categorical (grouping variables),
E N D
Extensions of simple linear regression • Multiple regression models: predictor variables are continuous • Analysis of variance: predictor variables are categorical (grouping variables), • But… general linear models can include both continuous and categorical predictors
Relative abundance of C3 and C4 plants • Paruelo & Lauenroth (1996) • Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C3 grasses and C4 grasses.
Relative abundance of PTFs (based on cover, biomass, and primary production) for each site Longitude Latitude Mean annual temperature Mean annual precipitation Winter (%) precipitation Summer (%) precipitation Biomes (grassland , shrubland) data 73 sites across temperate central North America Response variable Predictor variables
Relative abundance transformed ln(dat+1) because positively skewed
Collinearity • Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers) • Standard errors of the estimated regression slopes are inflated
Detecting collinearlity • Check tolerance values • Plot the variables • Examine a matrix of correlation coefficients between predictor variables
Dealing with collinearity • Omit predictor variables if they are highly correlated with other predictor variables that remain in the model
(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong) After centering both lat and long
Matrix algebra approach to OLS estimation of multiple regression models • Y=βX+ε • X’Xb=XY • b=(X’X) -1 (XY)
Criteria for “best” fitting in multiple regression with p predictors.
R2=0.48 C3 Longitude Latitude Model Lat + Long
45 Lat 35 Lat Model Lat * Long
The final forward model selection is: Step: AIC=-228.67 SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP Df Sum of Sq RSS AIC <none> 2.7759 -228.67 + LONG 1 0.0209705 2.7549 -227.23 + MAT 1 0.0001829 2.7757 -226.68 Call: lm(formula = SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP) Coefficients: (Intercept) LAT MAP JJAMAP DJFMAP -0.7892663 0.0391180 0.0001538 -0.8573419 -0.7503936
The final backward selection model is Step: AIC=-229.32 SQRT_C3 ~ LAT + JJAMAP + DJFMAP Df Sum of Sq RSS AIC <none> 2.8279 -229.32 - DJFMAP 1 0.26190 3.0898 -224.85 - JJAMAP 1 0.31489 3.1428 -223.61 - LAT 1 2.82772 5.6556 -180.72 Call: lm(formula = SQRT_C3 ~ LAT + JJAMAP + DJFMAP) Coefficients: (Intercept) LAT JJAMAP DJFMAP -0.53148 0.03748 -1.02823 -1.05164