100 likes | 425 Views
Linear Regression Analysis with a focus on Influence Diagnostics using proc reg prepared by Voytek Grus for. SAS user group, Halifax February 23, 2007. Introduction: What is Regression Analysis?.
E N D
Linear Regression Analysis with a focus on Influence Diagnosticsusing proc regprepared byVoytek Grusfor SAS user group, Halifax February 23, 2007
Introduction: What is Regression Analysis? • A broad collection of statistical techniques used to explore relationship between measurable variables. • It’s primary purpose is to describe the relationship between variables (model) and predict response or study its components (coefficients). • A central idea to RA is that it is a statistical (stochastic) process (not a deterministic equation) • A subgroup of Generalized Linear Models or/and Multivariate Analysis.
Introduction: Types of Regression Analysis • Data types and statistical techniques • Analysis of observational versus experimental data (proc rsreg) • Discrete response variable: logistic regression (proc logistic, transreg) • Time series versus cross-sectional data (procs autoreg, pdlreg, arimax) • Survival Analysis: lifetime or failure time (proc lifereg) • Regression on random predictors • Simultaneous Econometric equations (procs model, syslin) • Structural Equation Modeling (proc calis) • Estimation techniques • Linear vs non-linear (proc nlin nlinmix) • Least square vs non-least squares such as MLE. (proc robustreg) • Least squares vs partial-least squares (proc pls) • Multivariate regression (multiple response regression)
SAS offers many diverse tools to do regression analysis • A good way to start is to read about RA in SAS help. • Chapter 2 of “Introduction to Regression Procedures” gives a good overview of RA and SAS procedures available to do varies analyses. • SAS procedures, SAS Enterprise Guide, Matrix Programming language
Regression Analysis: Process • State the purpose of the analysis: prediction, variable screening, model specification, parameter estimation (signs and significance), influence diagnostics. • Identify type of regression analysis to be conducted and find appropriate tools • Assess quality of your data • Fit in regression model • Examine compliance with statistical assumptions, remedy violation of where necessary, assess quality of fit. • Draw conclusions
Diagnostics: testing for violation of assumptions • Analysis of residuals • Normality assumption (QQ- and PP-plots, added variable plots, partial residual plots, histograms, F tests for lack of fit, Durbin Watson) • Heteroscedasticity (ACOV and SPEC options). • Outlier detection (How large is too large?) • Influence diagnostics (cook’s distance, press) • Model specification (Levarage plots, Cp Mallow) • Non-linearity (scatter plots, partial res. Plots) • Over- and under-specfication • Multicollinearity tests (tol, vif, colin) • Autocorrelation (Durbin Watson) • Random predictors (X’s measured with errors)
Remedies to violation of assumptions • Variable selection process (stepwise, mxrl etc proc reg) • Variable transformation • Dummy variables • Box-Tidwell Procedure • Not all functions are linearizable and non-linear regression must be used. • Polynomial regression (proc rsreg) • Weighted Least squares (weight statement in proc reg) • Non-least Squares Regression • Failure of normality: Huber M-estimator (proc robustreg) • Principal Components regression (proc pls princomp) • Ridge regression (proc reg) • Partial Least Squares: random predictors • Proc pls • Non-linear regression • Proc NMLX, proc nlin, proc model
Functionality of Proc Reg in Linear Regression Analysis • Data modeling: by group processing, where statement, multiple model statements • Interactive analysis: reweigh, paint, plot statements etc. • Diagnostic tools: plots, tests (outliers, normality etc) Hypothesis Testing: F, t tests, partitioning of variability • Automated variable selection procedures: stepwise regression. Forward selection, backward elimination, maxr. • Model validation: Mallow Cp graphs. • Prediction: prediction intervals, press residuals etc.
Literature • Classical and Modern Regression with Applications Raymond H. Myers (1986) • Applied Linear Regression by Sanford Weisberg ( 1985) • SAS Help Examples