270 likes | 407 Views
ENGR 610 Applied Statistics Fall 2007 - Week 11. Marshall University CITE Jack Smith. Overview for Today. Review Simple Linear Regression , Ch 12 Go over problem 12.56 Multiple Linear Regression , Ch 13 (1-5) Multiple explanatory variables Coefficient of multiple determination
E N D
ENGR 610Applied StatisticsFall 2007 - Week 11 Marshall University CITE Jack Smith
Overview for Today • Review Simple Linear Regression, Ch 12 • Go over problem 12.56 • Multiple Linear Regression, Ch 13 (1-5) • Multiple explanatory variables • Coefficient of multiple determination • Adjusted R2 • Residue Analysis • F-test • t test and confidence interval for slope • Partial F-tests for each individual contributions • Coefficients of partial determination • Homework assignment
Regression Modeling • Analysis of variance to “fit” a predictive model for a response (dependent) variable to a set of one or more explanatory (independent) variables • Minimize residual error w.r.t. linear coefficients • Interpolative over relevant range - do not extrapolative • Typically linear, but may be curvilinear or more complex (w.r.t. independent variables) • Related to Correlation Analysis - measuring the strength of association between variables • Regression is about variance in the response variable • Correlation is about co-variance - symmetric
Types of Regression Models • Based on Scatter Plots • Y vs X • Dependent vs independent • Linear Models • Positive, negative or no slope • Zero or non-zero intercept • Curvilinear Models • Positive, negative or no “slope” • Positive, negative or varied curvature • May be U shaped, with extrema • May be asymptotically or piece-wise linear • May be polynomial, exponential, inverse,…
Least-Square Linear Regression • Simple Linear Model (for population) • Yi = 0 + 1Xi + i • Xi = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • 1 = slope (Y/X) • i = random error for observation i • Yi’ = b0 + b1Xi (predicted value) • b0 and b1 are called regression coefficients • ei = Yi - Yi’ (residual) • Minimize ei2 for sample with respect to b0 and b1
Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Determination r2 = SSR/SST Standard Error of the Estimate
Assumptions of Regression (and Correlation) • Normality of error about regression line • Homoscedasticity(equal variance) along X • Independence of errors with respect to X • No autocorrelation in time • Analysis of residuals to test assumptions • Histogram, Box-and-Whisker plots • Normalcy plot • Ordered plots (by X, by time,…) See figures on pp 584-5
t Test for Slope H0: 1 = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom
F Test for Single Regression • F = MSR / MSE • Reject H0 if F > FU(,1,n-2) [or p<] • Note: t2 (,n-2) = FU(,1,n-2) • One-Way ANOVA Summary
Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean • Confidence Interval Estimate for Individual Response See Fig 12.16, p 592
Pitfalls • Not testing assumptions of least-square regression by analyzing residuals, looking for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8 • Not being aware of alternatives to least-square regression when assumptions are violated • Not knowing subject matter being modeled
Computing by Hand • Slope • Y-Intercept
Computing by Hand • Measures of Variation
Coefficient of Correlation • For a regression • For a correlation Also called… Pearson’s product-moment correlation coefficient Covariance
t Test for Correlation H0: = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom Or Compared to FU(,1,n-2) = t2(,n-2)
Multiple Regression • Linear model - multiple dependent variables • Yi = 0 + 1X1i + … + jXji + i • Xji = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • j = slope (Y/Xj) • i = random error for observation i • Yi’ = b0 + b1Xi + … + bjXji (predicted value) • The bj’s are called the regression coefficients • ei = Yi - Yi’ (residual) • Minimize ei2 for sample with respect to all bj
Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R2Y.12..k = SSR/SST Standard Error of the Estimate
Adjusted R2 • To account for sample size (n) and number of dependent variables (k) for comparison purposes
Residual Analysis • Plot residuals vs • Yi’ (predicted values) • X1, X2,…,Xk • Time (for autocorrelation) • Check for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8
F Test for Multiple Regression • F = MSR / MSE • Reject H0 if F > FU(,k,n-k-1) [or p<] • k = number of independent variables • One-Way ANOVA Summary
AlternateF-Test Compared to FU(,k,n-k-1)
t Test for Slope H0: j = 0 See output from PHStat Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom
Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response • Beyond the scope of this text
Partial F Tests • Significance test for contribution from individual independent variable • Measure of incremental improvement • All others already taken into account • Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j}) • Reject H0 if Fj > FU(,1,n-k-1) [or p<] • Note: t2 (,n-k-1) = FU(,1,n-k-1)
Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637
Homework • Review “Multiple Regression”, 13.1-5 • Work through Appendix 13.1 • Work and hand in Problem13.62 • Read “Multiple Regression”, 13.6-11 • Quadratic model • Dummy-variable model • Using transformations • Collinearity (VIF) • Modeling building • Cp statistic and stepwise regression • Preview problems 13.63-13.67