Multiple Regression

Multiple Regression

Multiple regression • Previously discussed the one predictor scenario • Multiple regression is the case of having two or more independent variables predicting some outcome variable • Basic idea is the same as simple regression, however more will need to be considered in its interpretation

The best fitting plane • Before we attempted to find the best fitting line to our 2d scatterplot of values • With the addition of another predictor our cloud of values becomes 3d • Now we are looking for what amounts to the best fitting plane • With 3 or more we get into hyperspace and dealing with a regression surface • Regression equation:

Linear combination • The notion of a linear combination is important for you to understand for MR and multivariate techniques in general • Again, what MR analysis does is create a linear combination (weighted sum) of the predictors • The weights are important to help us assess the nature of the predictor-DV relationships with consideration of the other variables in the model • We then look to see how the linear combination in a sense matches up with the DV • One way to think about it is we extract relevant information from predictors to help us understand the DV

MR Example (X1) Pros of Condom Use 1 (X2) Cons of Condom Use 2 Stage of Condom Use (X3) Self-Efficacy of Condom Use 3 X’ 4 (X4) Psychosexual Functioning New Linear Combination

Considerations in multiple regression • Assumptions • Overall fit • Parameter estimates and variable importance • Variable entry • IV relationships • Prediction

Assumptions: Normality • The assumptions for simple regression will continue to hold • Normality, homoscedasticity, independence • Mulitvariate normality can be at least partially checked through examination of individual variables for normality, linearity, and heteroscedasticity • Tests for multivariate normality seem to be easily obtained in every package except SPSS

Assumptions: Model Misspecification • In addition, we must worry about model misspecification • Omitting relevant variables, including irrelevant ones, incorrect paths • Not much one can do about omitting relevant variables, but it may produce biased and less valid results • However we can’t just throw in all the variables we can think of also • Overfitting • Violation of Ockham's razor • Including irrelevant variables contributes to the standard error of estimate (and thus the SE for our coefficients) which will affect the statistical tests on individual variables

Example data • Current salary predicted by educational level, time since hire, and previous experience (N = 474) • As with any analysis, initial data analysis should be extensive prior to examination of the inferential analysis

Initial examination of data • We can use the descriptives to give us a general feel for what’s going on with the variables in question • Here we can also see that months since hire and previous experience are not too well correlated with our dependent variable of current salary • Ack! • We’d also want to look at the scatterplots to further aid our assessment of the predictor-DV relationships

Starting point: Statistical significance of the model • The Anova summary table tells us whether our model is statistically significant • R2 different from zero • Equation is better predictor than the mean • As with simple regression, the analysis involves the ratio of variance predicted to residual variance • As we can see, it is reflective of the relationship of the predictors to the DV (R2), the number of predictors in the model, and sample size

Multiple correlation coefficient • The multiple correlation coefficient is the correlation between the DV and the linear combination of predictors which minimizes the sum of the squared residuals • More simply, it is the correlation between the observed values and the values that would be predicted by our model • Its squared value (R2) is the amount of variance in the dependent variable accounted for by the independent variables

R2 • Here it appears we have an OK model for predicting current salary

Variable importance: Statistical significance • After noting that our model is viable, we can begin our interpretation of how the predictors’ relative contributions • To begin with we can examine the output to determine which variables statistically significantly contribute to the model • Standard error • measure of the variability that would be found among the different slopes estimated from other samples drawn from the same population

Variable importance: Statistical significance • We can see from the output that only previous experience and education level are statistically significant predictors

Variable importance: Weights • Statistical significance, as usual, is only a starting point for our assessment of results • What we’d really want is a measure of the unique contribution of an IV to the model • Unfortunately the regression coefficient, though useful in understanding that particular variable’s relationship to the DV, is not useful for comparing to other IVs that are of a different scale

Variable importance: standardized coefficients • Standardized regression coefficients get around that problem • Now we can see how much the DV will change in standard deviation units with one standard deviation unit change in the IV (all others held constant) • Here we can see that education level seems to have much more influence on the DV • Another 3 years of education  >$11000 bump in salary

Variable importance • However we still have other output to help us understand variable contribution • Partial correlation is the contribution of an IV after the contributions of the other IVs have been taken out of both the IV and DV • Semi-partial correlation is the unique contribution of an IV after the contribution of other IVs have been taken only out of the predictor in question

Variable importance: Partial correlation • A+B+C+D represents all the variability in the DV to be explained • A+B+C = R2 • The squared partial correlation is the amount a variable explains relative to the amount in the DV that is left to explain after the contributions of the other IVs have been removed from both the predictor and criterion • It is A/(A+D) • For IV2 it would be B/(B+D)

Variable importance: Semipartial correlation • The semipartial correlation (squared) is perhaps the more useful measure of contribution • It refers to the unique contribution of A to the model, i.e. the relationship between the DV and IV after the contributions of the other IVs have been removed from the predictor • A/(A+B+C+D) • For IV2 • B/(A+B+C+D) • Interpretation (of the squared value): • Out of all the variance to be accounted for, how much does this variable explain that no other IV does • or • How much would R2 drop if the variable were removed?

Variable importance • Note that exactly how partial and semi-partial will be figured will depend on the type of multiple regression employed. • The previous examples concerned a standard multiple regression situation. • For sequential (i.e. hierarchical) regression, the partial correlation would be • IV1 = (A+C)/(A+C+D) • IV2 = B/(B+D) IV1 IV2

Variable importance • For semi-partial correlation • IV1 = (A+C)/(A+B+C+D) • IV2 same as before • The result for the addition of the second variable is the same as it would be in standard MR • Thus if the goal is to see the unique contribution of a single variable after all others have been controlled for, there is no real reason to perform a sequential over standard MR • In general terms, it is the unique contribution of the variable at the point it enters the equation (sequential or stepwise)

Variable importance: Example data • The semipartial correlation is labeled as ‘part’ correlation in SPSS • Here we can see that education level is really doing all the work in this model • Obviously from some alternate universe

Another example • Mental health symptoms predicted by number of doctor visits, physical health symptoms, number of stressful life events

Here we see that physical health symptoms and stressful life events both significantly contribute to the model • Physical health symptoms more ‘important’

Variable Importance: Comparison • Comparison of standardized coefficients, partial, and semi-partial correlation coefficients • All of them are ‘partial’ correlations

Another Approach to Variable Importance • The methods just provided give us a glimpse as to variable importance, but interestingly we don’t have a unique contribution statistic that is a true decomposition of R-squared, i.e. that we could add each measure of importance to equal our overall R-squared • One that does provides an average R2 increase, depending on the order the variable enters into the model • 3 predictor example A B C; B A C, C A B etc. • One way to think about it using what you’ve just learned is thinking of the squared semi-partial correlation whether a variable is first second third etc. • Note that the average is for all possible permutations • E.g. the R-square contribution for B being first in the model includes B A C and B C A, both of which would of course be the same value • The following example comes from the survey data

As Predictor 1: R2 = .629 Note there are 2 models in which war would be .629 As Predictor 2: R2 change = .639 and .087 As Predictor 3: R2 change = .098 There are 2 models in which war would be .098

Interpretation • The average of these is the average contribution to R square for a particular variable over all possible orderings • In this case for war it is ~.36, i.e. on average, it increases R square 36% of variance accounted for • Furthermore, if we add up the average R-squared contribution for all three… • .36+.28+.01 = .65 • .65 is the R2 for the model

R program example • library(relaimpo) • RegModel.1 <- lm(SOCIAL~BUSH+MTHABLTY+WAR, data=Dataset) • calc.relimp(RegModel.1, type = c("lmg", "last", "first", "betasq", "pratt")) • Output: LMG, is what we were just talking about. LMG stands for Lindemann, Merenda and Gold, authors who introduced it • Last is simply the squared semi-partial correlation • First is just the square of the simple bivariate correlation between predictor and DV • Beta square is the square of the beta coefficient with ‘all in’ • Pratt is the product of the standardized correlation and the simple bivariate correlation • It too will add up to the model R2 but is not recommended, one reason being that it can actually be negative lmg last first betasq pratt BUSH 0.278 0.005 0.551 0.024 0.116 MATH 0.012 0.016 0.009 0.016 0.012 WAR 0.363 0.098 0.629 0.439 0.526 *Note the relaimpo package is equipped to provide bootstrapped estimates

Different Methods • Note that one’s assessment of relative importance may depend on the method • Much of the time those methods will largely agree, but they may not, so use multiple estimates to help you decide • One might go with the LMG typically as it is both intuitive and a decomposition of R2 lmg last first betasq pratt BUSH 0.278 0.005 0.551 0.024 0.116 MATH 0.012 0.016 0.009 0.016 0.012 WAR 0.363 0.098 0.629 0.439 0.526

Relative Importance Summary • There are multiple ways to estimate a variable’s contribution to the model, and some may be better than others • A general approach: • Check simple bivariate relationships. • If you don’t see worthwhile correlations with the DV there you shouldn’t expect much from your results regarding the model • Check for outliers and compare with robust measures also • You may detect that some variables are so highly correlated that one is redundant • Statistical significance is not a useful means of assessing relative importance, nor is the raw coefficient • Standardized coefficients and partial correlations are a first step • Compare standardized to simple correlations as a check on possible suppression • Of typical output the semi-partial correlation is probably the more intuitive assessment • The LMG is also intuitive, and is a natural decomposition of R2, unlike the others

Relative Importance Summary • One thing to keep in mind is that determining variable importance, while possible for a single sample, should not be overgeneralized • Variable orderings likely will change upon repeated sampling • E.g. while one might think that war and bush are better than math (it certainly makes theoretical sense), saying that either would be better than the other would be quite a stretch with just one sample • What you see in your sample is specific to it, and it would be wise to not make any bold claims without validation

Regression Diagnostics • Of course all of the previous information would be relatively useless if we are not meeting our assumptions and/or have overly influential data points • In fact, you shouldn’t be really looking at the results unless you test assumptions and look for outliers, even though this requires running the analysis to begin with • Various tools are available for the detection of outliers • Classical methods • Standardized Residuals (ZRESID) • Studentized Residuals (SRESID) • Studentized Deleted Residuals (SDRESID) • Ways to think about outliers • Leverage • Discrepancy • Influence • Thinking ‘robustly’

Regression Diagnostics • Standardized Residuals (ZRESID) • Standardized errors in prediction • Mean 0, Sd = std. error of estimate • To standardize, divide each residual by its s.e.e. • At best an initial indicator (e.g. the +2 rule of thumb), but because the case itself determines what the mean residual would be, almost useless • Studentized Residuals (SRESID) • Same thing but studentized residual recognizes that the error associated with predicting values far from the mean of X is larger than the error associated with predicting values closer to the mean of X • standard error is multiplied by a value that will allow the result to take this into account • Studentized Deleted Residuals (SDRESID) • Studentized in which the standard error is calculated with the case in question removed from the others

Regression Diagnostics • Mahalanobis’ Distance • Mahalanobis distance is the distance of a case from the centroid of the remaining points (point where the means meet in n-dimensional space) • Cook’s Distance • Identifies an influential data point whether in terms of predictor or DV • A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients. • With larger (relative) values, excluding a case would change the coefficients substantially. • DfBeta • Change in the regression coefficient that results from the exclusion of a particular case • Note that you get DfBetas for each coefficient associated with the predictors

Regression Diagnostics • Leverage assesses outliers among the IVs • Mahalanobis distance • Relatively high Mahalanobis suggests an outlier on one or more variables • Discrepancy • Measures the extent to which a case is in line with others • Influence • A product of leverage and discrepancy • How much would the coefficients change if the case were deleted? • Cook’s distance, dfBetas

Outliers • Influence plots • With a couple measures of ‘outlierness’ we can construct a scatterplot to note especially problematic cases • After fitting a regression model in R-commander, i.e. running the analysis, this graph is available via point and click • Here we have what is actually a 3-d plot, with 2 outlier measures on the x and y axes (studentized residuals and ‘hat’ values, a measure of leverage) and a third in terms of the size of the circle (Cook’s distance) • For this example, case 35 appears to be a problem

Outliers • It should be clear to interested readers whatever has been done to deal with outliers, • Applications such as S-plus, R, and even SAS and Stata (pretty much all but SPSS) provide methods of robust regression analysis, and would be preferred

Summary: Outliers • No matter the analysis, some cases will be the ‘most extreme’. However, none may really qualify as being overly influential. • Whatever you do, always run some diagnostic analysis and do not ignore influential cases • It should be clear to interested readers whatever has been done to deal with outliers • As noted before, the best approach to dealing with outliers when they do occur is to run a robust regression with capable software

Multiple Regression