Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies

Outline • Diagnostics • residual checks • ANOVA remedial measures

Diagnostics Overview • We will take the diagnostics and remedial measures that we learned for regression and adapt them to the ANOVA setting • Many things are essentially the same • Some things require modification

Residuals • Predicted values are cell means, = • Residuals are the differences between the observed values and the cell means Yij-

Basic plots • Plot the data vs the factor levels (the values of the explanatory variables) • Plot the residuals vs the factor levels • Construct a normal quantile plot and/or histogram of the residuals

KNNL Example • KNNL p 777 • Compare 4 brands of rust inhibitor (X has r=4 levels) • Response variable is a measure of the effectiveness of the inhibitor • There are 10 units per brand (n=10)

Plots • Data versus the factor • Residuals versus the factor • Normal quantile plot of the residuals

Plots vs the factor symbol1 v=circle i=none; proc gplot data=a2; plot (eff resid)*abrand; run;

Data vs the factor Means look different …common spread in Y’s

Residuals vs the factor Odd dist of points

QQ-plot Due to odd (lack of and large)spread Can try nonparametric analysis – last slides

General Summary • Look for • Outliers • Variance that depends on level • Non-normal errors • Plot residuals vs time and other variables if available

Homogeneity tests • Homogeneity of variance (homoscedasticity) • H0: σ12 = σ22 = … = σr2 • H1: not all σi2 are equal • Several significance tests are available

Homogeneity tests • Text discusses Hartley, modified Levene • SAS has several including Bartlett’s (essentially the likelihood ratio test) and several versions of Levene

Homogeneity tests • There is a problem with assumptions • ANOVA is robust with respect to moderate deviations from Normality • ANOVA results can be sensitive to the homogeneity of variance assumption • Some homogeneity tests are sensitive to the Normality assumption

Levene’s Test • Do ANOVA on the squared residuals from the original ANOVA • Modified Levene’s test uses absolute values of the residuals • Modified Levene’s test is recommended • Another quick and dirty rule of thumb

KNNL Example • KNNL p 785 • Compare the strengths of 5 types of solder flux (X has r=5 levels) • Response variable is the pull strength, force in pounds required to break the joint • There are 8 solder joints per flux (n=8)

Scatterplot

Levene’s Test proc glm data=a1; class type; model strength=type; means type/ hovtest=levene(type=abs); run;

ANOVA Table Common variance estimated to be 2.11

Output Levene's Test ANOVA of Absolute Deviations Source DF F Value Pr > F type 4 3.07 0.0288 Error 35 We reject the null hypothesis and assume nonconstant variance

Means and SDs Level strength type N Mean Std Dev 1 8 15.42 1.23 2 8 18.52 1.25 3 8 15.00 2.48 4 8 9.74 0.81 5 8 12.34 0.76

Remedies • Delete outliers • Is their removal important? • Use weights (weighted regression) • Transformations • Nonparametric procedures

What to do here? • Not really any obvious outliers • Do not see pattern of increasing or decreasing variance or skewed dists • Will consider • Weighted ANOVA • Mixed model ANOVA

Weighted least squares • We used this with regression • Obtain model for how the sd depends on the explanatory variable (plotted absolute value of residual vs x) • Then used weights inversely proportional to the estimated variance

Weighted Least Squares • Here we can compute the variance for each level • Use these as weights in PROC GLM • We will illustrate with the soldering example from KNNL

Obtain the variances and weights proc means data=a1; var strength; by type; output out=a2 var=s2; data a2; set a2; wt=1/s2; NOTE. Data set a2 has 5 cases

Proc Means Output

Merge and then use the weights in PROC GLM data a3; merge a1 a2; by type; proc glm data=a3; class type; model strength=type; weight wt; lsmeans type / cl; run;

Output Data have been standardized to have a variance of 1

LSMEANS Output Because of weights, standard errors simply based on sample variances of each level

Mixed Model ANOVA • Relax the assumption of constant variance rather than including a “known” weight • This involves moving to a mixed model procedure • Topic will not be on exam but wanted you to be aware of these model capabilities

SAS Code proc glimmix data=a1; class type; model strength=type / ddfm=kr; random residual / group=type; run; This allows the variance to differ in each level and a degrees of freedom adjustment is used to account for this

GLIMMIX OUTPUT Really 3 groups of variances

SAS Code proc glimmix data=a1; class type; model strength=type / ddfm=kr; random residual / group=type1; run; Type1 was created to identify Type 1 and 2, Type 3, and Type 4 and 5 as 3 groups

GLIMMIX OUTPUT Better BIC but same general type conclusion

Transformation Guides • When σi2 is proportional to μi, use • When σi is proportional to μi, use log(y) • When σi is proportional to μi2, use 1/y • For proportions, use arcsin( ) • arsin(sqrt(y)) in a SAS data step • Box-Cox transformation

Example • Consider study on KNNL pg 790 • Y: time between computer failures • X: three locations data a3; infile 'u:\.www\datasets512\CH18TA05.txt'; input time location interval; symbol1 v=circle; proc gplot; plot time*location; run;

Scatterplot Outlier or skewed distribution? Can consider transformation first

Box-Cox Transformation • Can consider regression and 1-b1 is the power to raise Y by • Can try various “convenient” powers • Can use SAS directly to calculate the power

E(logsig) = 0.90 + .79 logmu Power should be 1-.79 ≈ 0.20

Using SAS proc transreg data=a3; model boxcox(time / lambda=-2 to 2 by .2) = class(location); run;

Output

Transforming data in SAS data a3; set a3; transtime = time**0.20; symbol1 v=circle i=none; proc gplot; plot transtime*location; run;

Much more constant spread in data!

Nonparametric approach • Based on ranks • See KNNL section 18.7, p 795 • See the SAS procedure NPAR1WAY

Rust Inhibitor Analysis Highly significant F test. Even if there is a violation of Normality, the evidence is overwhelming

Nonparametric Analysis

Last slide • We’ve finished most of Chapters 17 and 18. • We used program topic23.sas to generate the output.

Topic 23: Diagnostics and Remedies