180 likes | 312 Views
ENGR 610 Applied Statistics Fall 2007 - Week 12. Marshall University CITE Jack Smith. Overview for Today. Review Multiple Linear Regression , Ch 13 (1-5) Go over problem 13.62 Multiple Linear Regression , Ch 13 (6-11) Quadratic model Dummy-variable model Using transformations
E N D
ENGR 610Applied StatisticsFall 2007 - Week 12 Marshall University CITE Jack Smith
Overview for Today • Review Multiple Linear Regression, Ch 13 (1-5) • Go over problem 13.62 • Multiple Linear Regression, Ch 13 (6-11) • Quadratic model • Dummy-variable model • Using transformations • Collinearity (VIF) • Modeling building • Stepwise regression • Best sub-set regression with Cp statistic • Homework assignment
Multiple Regression • Linear model - multiple dependent variables • Yi = 0 + 1X1i + … + jXji + … + kXki + i • Xji = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • j = slope (Y/Xj) • i = random error for observation i • Yi’ = b0 + b1Xi + … + bkXki (predicted value) • The bj’s are called the regression coefficients • ei = Yi - Yi’ (residual) • Minimize ei2 for sample with respect to all bj j = 1,k
Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R2Y.12..k = SSR/SST Standard Error of the Estimate
Adjusted R2 • To account for sample size (n) and number of dependent variables (k) for comparison purposes
Residual Analysis • Plot residuals vs • Yi’ (predicted values) • X1, X2,…,Xk • Time (for autocorrelation) • Check for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8
F Test for Multiple Regression • F = MSR / MSE • Reject H0 if F > FU(,k,n-k-1) [or p<] • k = number of independent variables • One-Way ANOVA Summary
AlternateF-Test Compared to FU(,k,n-k-1)
t Test for Slope H0: j = 0 See output from PHStat Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom
Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response • Beyond the scope of this text
Partial F Tests • Significance test for contribution from individual independent variable • Measure of incremental improvement • All others already taken into account • Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j}) • Reject H0 if Fj > FU(,1,n-k-1) [or p<] • Note: t2 (,n-k-1) = FU(,1,n-k-1)
Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637
Quadratic Curvilinear Regression Model • Yi = 0 + 1X1i + 2X1i2 + i • Treat the X2 term just like any other independent variable • Same R2, F tests, t tests, etc. • Generally need linear term as well
Dummy-Variable Models • Treatment of categorical variables • Each possible value represented by a dummy variable with value of 0 or 1 • Treat added terms like any other terms • Often confounded with other variables, so model may need interaction terms • Add interaction term and perform partial F test and t test for added term
Using Transformations • Square-root • Multiplicative - logY-logX model • Exponential - logY model • Others • Higher polynomials • Trigonometric functions • Inverse
Collinearity (VIF) • Test for linearly dependent variables • VIF - Variance Inflationary Factor • VIFj = 1/(1-Rj2) • Rj = coefficient of multiple determination of variable Xj with all other X variables • VIF > 5 suggests linear dependence (R2 > 0.8) • Full treatment involves analysis of correlation (covariance) matrix, such as • Principle Component Analysis (PCA) • To determine dimensionality and orthogonal factors • Factor Analysis (FA) • To determine rotated factors
See flow chart in text, Fig 13.25 (p 663) Model Building • Stepwise regression • Add or delete one variable at a time • Use partial F and/or t tests (p > 0.05) • Best-subset regression • Start with model including all variables (< n/10) • Eliminate highest variables with VIF > 5 • Generate all models with remaining variables (T) • Select best models using R2 and Cp statistic • Cp = (1-Rk2)(n-T)/(1-RT2) - (n-2(k+1)) • Cp ≤ k+1 • Evaluate each term using t test • Add interaction term, transformed variables, and higher order terms based on residual analysis
Homework • Work and hand in Problem13.63 • Fall break (Thanksgiving) – 11/22 • Review session – 11/29 (“dead” week) • “Linear Regression”, Ch 12-13 • Exam #3 • Linear regression (Ch 12-13) • Take-home • Due by 12/6 • Final grades due by 12/13