Topic 19: Remedies

Topic 19: Remedies

Outline • Review regression diagnostics • Remedial measures • Weighted regression • Ridge regression • Robust regression • Bootstrapping

Regression DiagnosticsSummary • Check normality of the residuals with a normal quantile plot • Plot the residuals versus predicted values, versus each of the X’s and (when appropriate) versus time • Examine the partial regression plots • Use the graphics smoother to see if there appears to be a curvilinear pattern

Regression DiagnosticsSummary • Examine • the studentized deleted residuals (RSTUDENT in the output) • The hat matrix diagonals • Dffits, Cook’s D, and the DFBETAS • Check observations that are extreme on these measures relative to the other observations

Regression DiagnosticsSummary • Examine the tolerance for each X • If there are variables with low tolerance, you need to do some model building • Recode variables • Variable selection

Remedial measures • Weighted least squares • Ridge regression • Robust regression • Nonparametric regression • Bootstrapping

Maximum Likelihood

Weighted regression • Maximization of L with respect to β’s is equivalent to minimization of • Weight of each observation: wi=1/σi2

Weighted least squares • Least squares problem is to minimize the sum of wi times the squared residual for case i • Computations are easy, use the weight statement in proc reg • bw = (X΄WX)-1(X΄WY) where W is a diagonal matrix of the weights • The problem now becomes determining the weights

Determination of weights • Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation • Similarly for the squared residual and another variable • Use grouped data or approximately grouped data to estimate the variance

Determination of weights • With a model for the standard deviation or the variance, we can approximate the optimal weights • Optimal weights are proportional to the inverse of the variance

KNNL Example • KNNL p 427 • Y is diastolic blood pressure • X is age • n = 54 healthy adult women aged 20 to 60 years old

Get the data and check it data a1; infile ‘../data/ch11ta01.txt'; input age diast; proc print data=a1; run;

Plot the relationship symbol1 v=circle i=sm70; proc gplot data=a1; plot diast*age / frame; run;

Diastolic bp vs age Strong linear relationship but non-constant variance

Run the regression proc reg data=a1; model diast=age; output out=a2 r=resid; run;

Regression output

Regression output Estimators still unbiased but no longer have minimum variance Prediction interval coverage often lower or higher than 95%

Use the output data set to get the absolute and squared residuals data a2; set a2; absr=abs(resid); sqrr=resid*resid;

Do the plots with a smooth proc gplot data=a2; plot (resid absr sqrr)*age; run;

Absolute value of the residuals vs age

Squared residuals vs age

Model the std dev vs age (absolute value of the residual) proc reg data=a2; model absr=age; output out=a3 p=shat; Note that a3 has the predicted standard deviations (shat)

Compute the weights data a3; set a3; wt=1/(shat*shat);

Regression with weights proc reg data=a3; model diast=age / clb; weight wt; run;

Output

Output Reduction in std err of the age coeff

Ridge regression • Similar to a very old idea in numerical analysis • If (X΄X) is difficult to invert (near singular) then approximate by inverting (X΄X+kI). • Estimators of coefficients are biased but more stable. • For some value of k ridge regression estimator has a smaller mean square error than ordinary least square estimator. • Can be used to reduce number of predictors • Ridge = k is an option for model statement . • Cross-validation used to determine k

Robust regression • Basic idea is to have a procedure that is not sensitive to outliers • Alternatives to least squares, minimize • sum of absolute values of residuals • median of the squares of residuals • Do weighted regression with weights based on residuals, and iterate

Nonparametric regression • Several versions • We have used i=sm70 • Interesting theory • All versions have some smoothing or penalty parameter similar to the 70 in i=sm70

Bootstrap • Very important theoretical development that has had a major impact on applied statistics • Based on simulation • Sample with replacement from the data or residuals and repeatedly refit model to get the distribution of the quantity of interest

Background Reading • We used programs topic19.sas • This completes Chapter 11 • This completes the material for the midterm

Topic 19: Remedies

Topic 19: Remedies

Presentation Transcript

Regulatory Compliance

Welcome to 4 th Grade

BIOLOGY

Secured Transactions Assignment 9

Secured Transactions Assignment 5

The Theory of Property Taxation

Unit 1

Topic 1

Multi-Q Introduction

AP Statistics Topic 2

Multi-Q Introduction

Multi-Q Introduction

Topic 1 Introduction to Electronics

Quiz Show Review

Topic Here Survivor

AP STATISTICS EXAM REVIEW

Topic 12: Electromagnetic induction

CONTINGENT TRADE REMEDIES BASIC PRINCIPLES and RULES April 2014

Injunctions and other Remedies in Water Right Proceedings

EU Procurement and Remedies Changes

Get paid to speak - find a topic that sells

Nausea After Eating - Causes & Natural Remedies To Tackle It !!