410 likes | 543 Views
Experimental Statistics - week 12. Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression. April 5 -- Lab. Analysis of Variance Approach. Mathematical Fact. SS(Total) = SS(Regression) + SS(Residuals). (S yy ). (SS “explained” by the model).
E N D
Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression
Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) (Syy ) (SS “explained” by the model) (SS “unexplained” by the model) p. 649
measures the proportion of the variability in Y that is explained by the regression on X
Y X X 10 15 12 20 8 17 14 24 7 8 8 12 4 12 11 15 12 8 8 7 12 4 15 11
The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) 1 170.492 Error =SS(Res) 6 23.508 Corrected Total 7 194.000 =SS(Total) The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model 1 19.575 Error 6 174.425 Corrected Total 7 194.000
RECALL Theoretical Model Regression line residuals
Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of xvs residuals
Study Time Data PROCGLM; MODEL score=time; OUTPUT out=new r=resid; RUN; PROCGPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;
Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of xvs residuals Normality of Residuals: - probability plot - histogram
Data – Page 572 Weight loss in a chemical compound as a function of how long it is exposed to air Y = weight loss (wtloss) X = exposure time (exptime) Y X 4.3 4 5.5 5 6.8 6 8.0 7 4.0 4 5.2 5 6.6 6 7.5 7 2.0 4 4.0 5 5.7 6 6.5 7
PROCREG; MODEL wtloss=exptime/r cli clm; output out=new r=resid; RUN; The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001
Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics Dependent Predict Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 4.3000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.7667 2 5.5000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.6500 3 6.8000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.6333 4 8.0000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.5167 5 4.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.4667 6 5.2000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.3500 7 6.6000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.4333 8 7.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.0167 9 2.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 -1.5333 10 4.0000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 -0.8500 11 5.7000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 -0.4667 12 6.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 -0.9833
The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001 ??? For testing H0:b0= 0 For testing H0:b1= 0
Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Recall: SS(Regression) = “Model SS” SS(Residual) = “Error SS”
H0: there is no linear relationship between X and Y H1: there is a linear relationship between X and Y Reject H0 if F > Fa(1,n – 2) where
H0: there is no linear relationship between weight loss and exposure time H1: there is a linear relationship between weight loss and exposure time
Note:In simple linear regression H0: there is no linear relationship between X and Y H1: there is a linear relationship between X and Y and H0: b1= 0 H1: b1≠ 0 F=t2 are equivalent and
Multiple Regression Use of more than one independent variable to predict Y Assumptions:
Data ith observation, jth independent variable and so we have
Goal: Find “best” prediction equation of the form As before:
Again: the solution involves calculus -- solving the Normal Equations on page 627
Analysis of Variance Sum of Mean Source DF Squares Square F Value Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) Corr. Total n-1 SS(Total)
Multiple Regression Setting H0: there is no linear relationship between Y and the independent variables H1: there is a linear relationship between Y and the independent variables Reject H0 if F > Fa(k, n - k-1) where
- in MLR Setting has the same interpretation as before measures the proportion of the variability in Y that is explained by the regression
Data – Page 628 Weight loss in a chemical compound as a function of exposure time and humidity Y = weight loss (wtloss) X1 = exposure time (exptime) X2 = relative humidity (humidity) Y X1 X2 4.3 4 .2 5.5 5 .2 6.8 6 .2 8.0 7 .2 4.0 4 .3 5.2 5 .3 6.6 6 .3 7.5 7 .3 2.0 4 .4 4.0 5 .4 5.7 6 .4 6.5 7 .4
Chemical Weight Loss – MLR Output The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 31.12417 15.56208 104.13 <.0001 Error 9 1.34500 0.14944 Corrected Total 11 32.46917 Root MSE 0.38658 R-Square 0.9586 Dependent Mean 5.50833 Adj R-Sq 0.9494 Coeff Var 7.01810 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002
H0: there is no linear relationship between weight loss and the variables exposure time and humidity H1: there is a linear relationship between weight loss and the variables exposure time and humidity
Examining Contributions of Individual X variables Use t-test for the X variable in question. - this tests the effect of that particular independent variable while all other independent variables stay constant. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002