1 / 41

Experimental Statistics - week 12

Experimental Statistics - week 12. Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression. April 5 -- Lab. Analysis of Variance Approach. Mathematical Fact. SS(Total) = SS(Regression) + SS(Residuals). (S yy ). (SS “explained” by the model).

Download Presentation

Experimental Statistics - week 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression

  2. April 5 -- Lab

  3. Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) (Syy ) (SS “explained” by the model) (SS “unexplained” by the model) p. 649

  4. Plot of Production vs Cost

  5. SS(???)

  6. SS(???)

  7. SS(???)

  8. measures the proportion of the variability in Y that is explained by the regression on X

  9. Y X X 10 15 12 20 8 17 14 24 7 8 8 12 4 12 11 15 12 8 8 7 12 4 15 11

  10. The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) 1 170.492 Error =SS(Res) 6 23.508 Corrected Total 7 194.000 =SS(Total) The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model 1 19.575 Error 6 174.425 Corrected Total 7 194.000

  11. RECALL Theoretical Model Regression line residuals

  12. Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of xvs residuals

  13. Study Time Data PROCGLM; MODEL score=time; OUTPUT out=new r=resid; RUN; PROCGPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;

  14. Average Height of Girls by Age

  15. Average Height of Girls by Age

  16. Residual Plot

  17. Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of xvs residuals Normality of Residuals: - probability plot - histogram

  18. Residuals from Car Dataset fit using √hp

  19. Residuals from Car Dataset fit using log(hp)

  20. Data – Page 572 Weight loss in a chemical compound as a function of how long it is exposed to air Y = weight loss (wtloss) X = exposure time (exptime) Y X 4.3 4 5.5 5 6.8 6 8.0 7 4.0 4 5.2 5 6.6 6 7.5 7 2.0 4 4.0 5 5.7 6 6.5 7

  21. PROCREG; MODEL wtloss=exptime/r cli clm; output out=new r=resid; RUN; The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001

  22. Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics Dependent Predict Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 4.3000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.7667 2 5.5000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.6500 3 6.8000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.6333 4 8.0000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.5167 5 4.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.4667 6 5.2000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.3500 7 6.6000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.4333 8 7.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.0167 9 2.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 -1.5333 10 4.0000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 -0.8500 11 5.7000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 -0.4667 12 6.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 -0.9833

  23. The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001 ??? For testing H0:b0= 0 For testing H0:b1= 0

  24. Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Recall: SS(Regression) = “Model SS” SS(Residual) = “Error SS”

  25. H0: there is no linear relationship between X and Y H1: there is a linear relationship between X and Y Reject H0 if F > Fa(1,n – 2) where

  26. H0: there is no linear relationship between weight loss and exposure time H1: there is a linear relationship between weight loss and exposure time

  27. Note:In simple linear regression H0: there is no linear relationship between X and Y H1: there is a linear relationship between X and Y and H0: b1= 0 H1: b1≠ 0 F=t2 are equivalent and

  28. Multiple Regression Use of more than one independent variable to predict Y Assumptions:

  29. Data ith observation, jth independent variable and so we have

  30. Goal: Find “best” prediction equation of the form As before:

  31. Again: the solution involves calculus -- solving the Normal Equations on page 627

  32. Analysis of Variance Sum of Mean Source DF Squares Square F Value Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) Corr. Total n-1 SS(Total)

  33. Multiple Regression Setting H0: there is no linear relationship between Y and the independent variables H1: there is a linear relationship between Y and the independent variables Reject H0 if F > Fa(k, n - k-1) where

  34. - in MLR Setting has the same interpretation as before measures the proportion of the variability in Y that is explained by the regression

  35. Data – Page 628 Weight loss in a chemical compound as a function of exposure time and humidity Y = weight loss (wtloss) X1 = exposure time (exptime) X2 = relative humidity (humidity) Y X1 X2 4.3 4 .2 5.5 5 .2 6.8 6 .2 8.0 7 .2 4.0 4 .3 5.2 5 .3 6.6 6 .3 7.5 7 .3 2.0 4 .4 4.0 5 .4 5.7 6 .4 6.5 7 .4

  36. Chemical Weight Loss – MLR Output The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 31.12417 15.56208 104.13 <.0001 Error 9 1.34500 0.14944 Corrected Total 11 32.46917 Root MSE 0.38658 R-Square 0.9586 Dependent Mean 5.50833 Adj R-Sq 0.9494 Coeff Var 7.01810 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002

  37. H0: there is no linear relationship between weight loss and the variables exposure time and humidity H1: there is a linear relationship between weight loss and the variables exposure time and humidity

  38. Examining Contributions of Individual X variables Use t-test for the X variable in question. - this tests the effect of that particular independent variable while all other independent variables stay constant. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002

More Related