190 likes | 357 Views
Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity.
E N D
Multiple Linear Regression. • Concept and uses. • Model and assumptions. • Intrinsically linear models. • Model development and validation. • Problem areas. • Non-normality. • Heterogeneous variance. • Correlated errors. • Influential points and outliers. • Model inadequacies. • Collinearity. • Errors in X variables. AGR206
Concept & Uses. Did you know? ANOVA REGRESSION • Description restricted to data set. Did biomass increase with pH in the sample? • Prediction of Y. How much biomass we expect to find in certain soil conditions? • Extrapolation for new conditions: can we predict biomass in other estuaries? • Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors? • Control of process: requires causality. Can we create sites with certain biomass by changing the pH? AGR206
Body fat example in JMP. • Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people. • Random sample of people. • Y was measured by an expensive and very accurate method (assume it reveals true %fat). • X1: thickness of triceps skinfold • X2: thigh circumference • X3: midarm circumference. • Bodyfat.jmp AGR206
Ho’s or values “of interest” • Does thickness of triceps skinfold contribute significantly to predict fat content? • What is the CI for fat content for a person whose X’s have been measured? • Do I have more or less fat than last summer? • Do I have more fat than recommended? AGR206
Model and Assumptions. • Linear, additive model to relate Y to p independent variables. • Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept. • Yi=b0+ b1 Xi1+…+ bp Xip+ei • where ei are normal and independent random variables with common variance s2. • In matrix notation the model and solution are exactly the same as for SLR:Y= Xb + eb=(X’X)-1(X’Y) • All equations from SLR apply without change. AGR206
Linear models • Linear, and intrinsically linear models. • Linearity refers to the parameters. The model can involve any function of X’s for as long as they do not have parameters that have to be adjusted. • A linear model does not always produce a hyperplane. • Yi=b0+ b1 f1(Xi1)+…+ bp fp(Xi1)+ei • Polynomial regression. • Is a special case where the functions are powers of X. AGR206
Matrix Equations AGR206
Extra Sum of Squares • Effects of order of entry on SS. • The 4 types of SS. • Partial correlation. AGR206
Response plane and error Y Yi E{Yi} X2 X1 The response surface in more than 3D is a hyperplane. AGR206
Model development • What variables to include. • Depends on objective: • descriptive -> no need to reduce number of variables. • Prediction and estimation of Yhat: OK to reduce for economical use. • Estimation of b and understanding: sensitive to deletions; may bias MSE and b. No real solution other than getting more data from better experiment. (Sorry!) AGR206
Variable Selection • Effects of elimination of variables: • MSE is positively biased unless true b for variables eliminated is 0. • bhat and Yhat are biased unless previous condition or variables eliminated are orthogonal to those retained. • Variance of estimated parameters and predictions is usually lower. • There are conditions for which MSE for reduced model (including variance and bias2) is smaller. AGR206
Criteria for variable selection • R2 - Coefficient of determination. • R2 = SSReg/SSTotal • MSE or MSRes - Mean squared residuals. • if all X’s in it estimates s2. • R2adj - Adjusted R2. • R2adj = 1-MSE/MSTo = =1-[(n-1)/(n-p)] (SSE/SSTo) • Mallow’s Cp • Cp=[SSRes/MSEFull] + 2 p- n(p=number of parameters) AGR206
Example AGR206
Checking assumptions. • Note that although we have many X’s, errors are still in a single dimension. • Residual analysis is performed as for SLR, sometimes repeated over different X’s. • Normality. Use proc univ normal option. Transform. • Homogeneity of variance. Plot error vs. each X. Transform. Weighted least squares. • Independence of errors. • Adequacy of model. Plots errors. LOF. • Influence and outliers. Use influence option in proc reg. • Collinearity. Use collinoint option of proc reg. AGR206
code for PROC REG data s00.spart2; set s00.spartina; colin=2*ph+0.5*acid+sal+rannor(23); run; proc reg data=s00.spart2; model bmss= colin h2s sal eh7 ph acid p k ca mg na mn zn cu nh4 / r influence vif collinoint stb partial; run; model colin=ph sal acid; run; AGR206
Spartina ANOVA output Model: MODEL1 Dependent Variable: BMSS Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 15 16369583.2 1091305.552 11.297 0.0001 Error 29 2801379.9 96599.307 C Total 44 19170963.2 Root MSE 310.80429 R-square 0.8539 Dep Mean 1000.80000 Adj R-sq 0.7783 C.V. 31.05558 AGR206
Parameters and VIF Parameter Estimates Parameter Standard T for H0: Standardized Variable DF Estimate Error Parameter=0 Prob > |T| Estimate INTERCEP 1 3809.233562 3038.081 1.254 0.2199 0.00000000 COLIN 1 -178.317065 58.718 -3.037 0.0050 -1.06227792 H2S 1 0.336242 2.656 0.127 0.9001 0.01563626 SAL 1 150.513276 61.960 2.429 0.0216 0.84818417 EH7 1 2.288694 1.785 1.282 0.2099 0.12813770 PH 1 486.417077 306.756 1.586 0.1237 0.91891994 ACID 1 -24.816449 109.856 -0.226 0.8229 -0.09422943 P 1 0.153015 2.417 0.063 0.9500 0.00639498 K 1 -0.733250 0.439 -1.668 0.1061 -0.33059243 CA 1 -0.137163 0.111 -1.230 0.2286 -0.35706572 MG 1 -0.318586 0.243 -1.308 0.2010 -0.45340287 NA 1 -0.005294 0.022 -0.239 0.8127 -0.05520175 MN 1 -4.279887 4.836 -0.885 0.3835 -0.15872971 ZN 1 -26.270852 19.452 -1.351 0.1873 -0.32953283 CU 1 346.606818 99.295 3.491 0.0016 0.54452366 NH4 1 0.539373 3.061 0.176 0.8614 0.03862822 Variance Inflation 0.00000000 24.28364757 3.02785626 24.19556405 1.98216733 66.64921013 34.53131689 2.02507775 7.79660017 16.72702792 23.82835726 10.57323219 6.38589662 11.81574077 4.82931410 9.53842459 AGR206