160 likes | 179 Views
Chapter 11. Validation of Regression Models. 11.1 Introduction. What the regression equation was created for, may not always be what it is used for.
E N D
Chapter 11 Validation of Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
11.1 Introduction • What the regression equation was created for, may not always be what it is used for. • Model Adequacy Checking – Residual analysis, lack of fit testing, determining influential observations. Checks the fit of the model to the available data. • Model Validation – determining if the model will behave or function as it was intended in the operating environment. Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques • Analysis of model coefficients and predicted values • Check for “inappropriate” signs on the coefficients; • Check for unusual magnitudes on the coefficients; • Check for stability in the coefficient estimates; • Check the predicted values (do they make sense for the nature of the data?) 2. Collection of new data • Usually 15-20 new observations are adequate Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.1 The Hald Cement Data Coefficients of x1very similar, coefficients of x2and the interceptmoderately different Difference in predicted values? Linear Regression Analysis 5E Montgomery, Peck & Vining
Which model would you prefer? Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.2 The Delivery Time Data Compare the residual mean square to the average squared prediction error Linear Regression Analysis 5E Montgomery, Peck & Vining
New data: Average squared prediction error Linear Regression Analysis 5E Montgomery, Peck & Vining
How does this compare to the R2 for prediction based on PRESS? Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques 3. Data splitting (aka cross validation) • Divide the data into two parts: estimation data and prediction data • The PRESS statistic is an estimate of performance based on data splitting • We can also use PRESS to compute an R2 type statistic for prediction: Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques 3. Data splitting (aka cross validation) • If the time sequence is known, data splitting can be done by time order (common in time series or forecasting) • Other characteristics of the data (are data grouped by operator, machine, location, etc.) • Double cross validation • Drawbacks? • A more formal approach? • The DUPLEX algorithm Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.3 The Delivery Time Data A portion of Table 11.3 showing prediction and estimation data determined with DUPLEX, Linear Regression Analysis 5E Montgomery, Peck & Vining
A portion of Table 11.4 is reproduced here. Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.3 The Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining