300 likes | 424 Views
Anareg week 12. Remedial Measures II Review regression diagnostics Remedial measures Validation. Regression Diagnostics Recommendations. Check normality of the residuals with a normal quantile plot
E N D
Anareg week 12 Remedial Measures II Review regression diagnostics Remedial measures Validation
Regression DiagnosticsRecommendations • Check normality of the residuals with a normal quantile plot • Plot the residuals versus Y(hat), versus each of the X’s and (where appropriate) versus time • Examine the partial regression plots • If there appears to be a pattern, generate the graphics version with a smooth
Regression DiagnosticsRecommendations (2) • Examine • the studentized deleted residuals (RSTUDENT in the output) • The hat matrix diagonals • Dffits, Cook’s D, and the DFBETAS • Check observations that are extreme on these measures relative to the other observations
Regression DiagnosticsRecommendations (3) • TOL = (1 – R2k) =1 / VIF • Examine the tolerance for each X • If there are variables with low tolerance, you need to do some model building • Recode variables • Variable selection
Remedial measures Weighted least squares Ridge regression Robust regression Nonparametric regression Bootstrapping
Weighted least squares Least squares problem is to minimize the the sum of wi times the squared residual for case i Computations are easy, use the weight statement in proc reg The problem is to determine the weight
Weighted least squares (2) • bw = (X’WX)-1(X’WY) • where W is a diagonal matrix with the weights • Usually we require that the sum of the weights is n, but some software will automatically take care of this
Determination of weights Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation Similarly for the squared residual and the variance Use grouped data or appriximately grouped data to estimate the variance
Determination of weights (2) With a model for the standard deviation or the variance, we can approximate the optimal weights Optimal weights are proportional to the inverse of the variance
NKNW Example NKNW p 421 Y is diastolic blood pressure X is age n = 54 healthy adult women aged 20 to 60 years old
Procedures • Get the data and check it • Plot the relationship • Run the regression • Diagnostic residuals
Regression output Run the regression, we get the output as follows: Source DF Value Pr > F Model 1 35.79 <.0001 Error 52 Total 53
Regression output (2) Root MSE 8.14 R-Square 0.40 Adj R-Sq 0.39 Dep Mean 79.1 Coef Var 10.2
Regression output (3) Par St Var DF Est Err t P Int 1 56.1 3.9 14.06 <.0001 age 1 0.58 .09 5.98 <.0001
Diagnostic the residuals • Use the output data set to get the absolute and squared residuals. • And do the plots with a smooth; • Predict the standard deviation (absolute value of the residual)
Predict the standard deviation (absolute value of the residual) and compute the weight
Output Regress diastolic on age with weight. And the output is : Source DF F P Model 1 56.64 <.0001 Error 52 Total 53
Output (2) Root MSE 1.21302 R-Square 0.5214 Adj R-Sq 0.5122 Dependent Mean 73.55134 Coeff Var 1.64921
Output (3) Par St Var Est Err t P Int 55.5 2.5 22.04 <.0001 age 0.59 0.07 7.53 <.0001
Ridge regression If (X’X) is difficult to invert (near singular) then approximate by inverting (X’X+cI) plus additional terms in a series; c is a small positive constant Interesting but has not turned out to be a useful method in practice Ridge = c is an option for model statement
Ridge regression (2) • Ridge estimator For OLS, the normal eq are given by : (X’X)b = X’Y Using standardized of each variables, we have the transformation regression model: The least squares normal eq are given by rXXb = rYX The ridge standardized regression estimators are obtained by introducing a biasing constant c > 0 into the least squares normal eq. (rXX + CI)bR = rYX
Ridge regression (3) Solutions of the normal eq yields the ridge standardized regresiopn coeff bR = (rXX + cI)-1rYX Whre the constant c reflects the amounrt of bias in the estimators
Robust regression • Basic idea is to have a procedure that is not sensitive to outliers • Alternatives to least squares, minimize • sum of absolute values of residuals • Median of the squares of residuals • Do weighted regression with weights based on residuals, and iterate
Bootstrap Very important theoretical development that will have a major impact on applied statistics Based on simulation Sample with replacement from the data or residuals and get the distribution of the quantity of interest CI based on quantiles of the sampling distribution
Model validation • Three approaches to checking the validity of the model • Collect new data, does it fit the model • Compare with theory, other data, simulation • Use some of the data for the basic analysis and some for validity check
Last slide Read NKNW Chapter 11