1 / 27

ENGR 610 Applied Statistics Fall 2007 - Week 11

ENGR 610 Applied Statistics Fall 2007 - Week 11. Marshall University CITE Jack Smith. Overview for Today. Review Simple Linear Regression , Ch 12 Go over problem 12.56 Multiple Linear Regression , Ch 13 (1-5) Multiple explanatory variables Coefficient of multiple determination

Download Presentation

ENGR 610 Applied Statistics Fall 2007 - Week 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENGR 610Applied StatisticsFall 2007 - Week 11 Marshall University CITE Jack Smith

  2. Overview for Today • Review Simple Linear Regression, Ch 12 • Go over problem 12.56 • Multiple Linear Regression, Ch 13 (1-5) • Multiple explanatory variables • Coefficient of multiple determination • Adjusted R2 • Residue Analysis • F-test • t test and confidence interval for slope • Partial F-tests for each individual contributions • Coefficients of partial determination • Homework assignment

  3. Regression Modeling • Analysis of variance to “fit” a predictive model for a response (dependent) variable to a set of one or more explanatory (independent) variables • Minimize residual error w.r.t. linear coefficients • Interpolative over relevant range - do not extrapolative • Typically linear, but may be curvilinear or more complex (w.r.t. independent variables) • Related to Correlation Analysis - measuring the strength of association between variables • Regression is about variance in the response variable • Correlation is about co-variance - symmetric

  4. Types of Regression Models • Based on Scatter Plots • Y vs X • Dependent vs independent • Linear Models • Positive, negative or no slope • Zero or non-zero intercept • Curvilinear Models • Positive, negative or no “slope” • Positive, negative or varied curvature • May be U shaped, with extrema • May be asymptotically or piece-wise linear • May be polynomial, exponential, inverse,…

  5. Least-Square Linear Regression • Simple Linear Model (for population) • Yi = 0 + 1Xi + i • Xi = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • 1 = slope (Y/X) • i = random error for observation i • Yi’ = b0 + b1Xi (predicted value) • b0 and b1 are called regression coefficients • ei = Yi - Yi’ (residual) • Minimize  ei2 for sample with respect to b0 and b1

  6. Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Determination r2 = SSR/SST Standard Error of the Estimate

  7. Partitioning of Variation - Graphically

  8. Assumptions of Regression (and Correlation) • Normality of error about regression line • Homoscedasticity(equal variance) along X • Independence of errors with respect to X • No autocorrelation in time • Analysis of residuals to test assumptions • Histogram, Box-and-Whisker plots • Normalcy plot • Ordered plots (by X, by time,…) See figures on pp 584-5

  9. t Test for Slope H0: 1 = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom

  10. F Test for Single Regression • F = MSR / MSE • Reject H0 if F > FU(,1,n-2) [or p<] • Note: t2 (,n-2) = FU(,1,n-2) • One-Way ANOVA Summary

  11. Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean • Confidence Interval Estimate for Individual Response See Fig 12.16, p 592

  12. Pitfalls • Not testing assumptions of least-square regression by analyzing residuals, looking for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8 • Not being aware of alternatives to least-square regression when assumptions are violated • Not knowing subject matter being modeled

  13. Computing by Hand • Slope • Y-Intercept

  14. Computing by Hand • Measures of Variation

  15. Coefficient of Correlation • For a regression • For a correlation Also called… Pearson’s product-moment correlation coefficient Covariance

  16. t Test for Correlation H0:  = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom Or Compared to FU(,1,n-2) = t2(,n-2)

  17. Multiple Regression • Linear model - multiple dependent variables • Yi = 0 + 1X1i + … + jXji + i • Xji = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • j = slope (Y/Xj) • i = random error for observation i • Yi’ = b0 + b1Xi + … + bjXji (predicted value) • The bj’s are called the regression coefficients • ei = Yi - Yi’ (residual) • Minimize  ei2 for sample with respect to all bj

  18. Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R2Y.12..k = SSR/SST Standard Error of the Estimate

  19. Adjusted R2 • To account for sample size (n) and number of dependent variables (k) for comparison purposes

  20. Residual Analysis • Plot residuals vs • Yi’ (predicted values) • X1, X2,…,Xk • Time (for autocorrelation) • Check for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8

  21. F Test for Multiple Regression • F = MSR / MSE • Reject H0 if F > FU(,k,n-k-1) [or p<] • k = number of independent variables • One-Way ANOVA Summary

  22. AlternateF-Test Compared to FU(,k,n-k-1)

  23. t Test for Slope H0: j = 0 See output from PHStat Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom

  24. Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response • Beyond the scope of this text

  25. Partial F Tests • Significance test for contribution from individual independent variable • Measure of incremental improvement • All others already taken into account • Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j}) • Reject H0 if Fj > FU(,1,n-k-1) [or p<] • Note: t2 (,n-k-1) = FU(,1,n-k-1)

  26. Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637

  27. Homework • Review “Multiple Regression”, 13.1-5 • Work through Appendix 13.1 • Work and hand in Problem13.62 • Read “Multiple Regression”, 13.6-11 • Quadratic model • Dummy-variable model • Using transformations • Collinearity (VIF) • Modeling building • Cp statistic and stepwise regression • Preview problems 13.63-13.67

More Related