1 / 40

Multiple Regression

Multiple Regression. Multiple regression. Previously discussed the one predictor scenario Multiple regression is the case of having two or more independent variables predicting some outcome variable

Samuel
Download Presentation

Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Regression

  2. Multiple regression • Previously discussed the one predictor scenario • Multiple regression is the case of having two or more independent variables predicting some outcome variable • Basic idea is the same as simple regression, however more will need to be considered in its interpretation

  3. The best fitting plane • Before we attempted to find the best fitting line to our 2d scatterplot of values • With the addition of another predictor our cloud of values becomes 3d • Now we are looking for what amounts to the best fitting plane • With 3 or more we get into hyperspace and dealing with a regression surface • Regression equation:

  4. Linear combination • The notion of a linear combination is important for you to understand for MR and multivariate techniques in general • Again, what MR analysis does is create a linear combination (weighted sum) of the predictors • The weights are important to help us assess the nature of the predictor-DV relationships with consideration of the other variables in the model • We then look to see how the linear combination in a sense matches up with the DV • One way to think about it is we extract relevant information from predictors to help us understand the DV

  5. MR Example (X1) Pros of Condom Use 1 (X2) Cons of Condom Use 2 Stage of Condom Use (X3) Self-Efficacy of Condom Use 3 X’ 4 (X4) Psychosexual Functioning New Linear Combination

  6. Considerations in multiple regression • Assumptions • Overall fit • Parameter estimates and variable importance • Variable entry • IV relationships • Prediction

  7. Assumptions: Normality • The assumptions for simple regression will continue to hold • Normality, homoscedasticity, independence • Mulitvariate normality can be at least partially checked through examination of individual variables for normality, linearity, and heteroscedasticity • Tests for multivariate normality seem to be easily obtained in every package except SPSS

  8. Assumptions: Model Misspecification • In addition, we must worry about model misspecification • Omitting relevant variables, including irrelevant ones, incorrect paths • Not much one can do about omitting relevant variables, but it may produce biased and less valid results • However we can’t just throw in all the variables we can think of also • Overfitting • Violation of Ockham's razor • Including irrelevant variables contributes to the standard error of estimate (and thus the SE for our coefficients) which will affect the statistical tests on individual variables

  9. Example data • Current salary predicted by educational level, time since hire, and previous experience (N = 474) • As with any analysis, initial data analysis should be extensive prior to examination of the inferential analysis

  10. Initial examination of data • We can use the descriptives to give us a general feel for what’s going on with the variables in question • Here we can also see that months since hire and previous experience are not too well correlated with our dependent variable of current salary • Ack! • We’d also want to look at the scatterplots to further aid our assessment of the predictor-DV relationships

  11. Starting point: Statistical significance of the model • The Anova summary table tells us whether our model is statistically significant • R2 different from zero • Equation is better predictor than the mean • As with simple regression, the analysis involves the ratio of variance predicted to residual variance • As we can see, it is reflective of the relationship of the predictors to the DV (R2), the number of predictors in the model, and sample size

  12. Multiple correlation coefficient • The multiple correlation coefficient is the correlation between the DV and the linear combination of predictors which minimizes the sum of the squared residuals • More simply, it is the correlation between the observed values and the values that would be predicted by our model • Its squared value (R2) is the amount of variance in the dependent variable accounted for by the independent variables

  13. R2 • Here it appears we have an OK model for predicting current salary

  14. Variable importance: Statistical significance • After noting that our model is viable, we can begin our interpretation of how the predictors’ relative contributions • To begin with we can examine the output to determine which variables statistically significantly contribute to the model • Standard error • measure of the variability that would be found among the different slopes estimated from other samples drawn from the same population

  15. Variable importance: Statistical significance • We can see from the output that only previous experience and education level are statistically significant predictors

  16. Variable importance: Weights • Statistical significance, as usual, is only a starting point for our assessment of results • What we’d really want is a measure of the unique contribution of an IV to the model • Unfortunately the regression coefficient, though useful in understanding that particular variable’s relationship to the DV, is not useful for comparing to other IVs that are of a different scale

  17. Variable importance: standardized coefficients • Standardized regression coefficients get around that problem • Now we can see how much the DV will change in standard deviation units with one standard deviation unit change in the IV (all others held constant) • Here we can see that education level seems to have much more influence on the DV • Another 3 years of education  >$11000 bump in salary

  18. Variable importance • However we still have other output to help us understand variable contribution • Partial correlation is the contribution of an IV after the contributions of the other IVs have been taken out of both the IV and DV • Semi-partial correlation is the unique contribution of an IV after the contribution of other IVs have been taken only out of the predictor in question

  19. Variable importance: Partial correlation • A+B+C+D represents all the variability in the DV to be explained • A+B+C = R2 • The squared partial correlation is the amount a variable explains relative to the amount in the DV that is left to explain after the contributions of the other IVs have been removed from both the predictor and criterion • It is A/(A+D) • For IV2 it would be B/(B+D)

  20. Variable importance: Semipartial correlation • The semipartial correlation (squared) is perhaps the more useful measure of contribution • It refers to the unique contribution of A to the model, i.e. the relationship between the DV and IV after the contributions of the other IVs have been removed from the predictor • A/(A+B+C+D) • For IV2 • B/(A+B+C+D) • Interpretation (of the squared value): • Out of all the variance to be accounted for, how much does this variable explain that no other IV does • or • How much would R2 drop if the variable were removed?

  21. Variable importance • Note that exactly how partial and semi-partial will be figured will depend on the type of multiple regression employed. • The previous examples concerned a standard multiple regression situation. • For sequential (i.e. hierarchical) regression, the partial correlation would be • IV1 = (A+C)/(A+C+D) • IV2 = B/(B+D) IV1 IV2

  22. Variable importance • For semi-partial correlation • IV1 = (A+C)/(A+B+C+D) • IV2 same as before • The result for the addition of the second variable is the same as it would be in standard MR • Thus if the goal is to see the unique contribution of a single variable after all others have been controlled for, there is no real reason to perform a sequential over standard MR • In general terms, it is the unique contribution of the variable at the point it enters the equation (sequential or stepwise)

  23. Variable importance: Example data • The semipartial correlation is labeled as ‘part’ correlation in SPSS • Here we can see that education level is really doing all the work in this model • Obviously from some alternate universe

  24. Another example • Mental health symptoms predicted by number of doctor visits, physical health symptoms, number of stressful life events

  25. Here we see that physical health symptoms and stressful life events both significantly contribute to the model • Physical health symptoms more ‘important’

  26. Variable Importance: Comparison • Comparison of standardized coefficients, partial, and semi-partial correlation coefficients • All of them are ‘partial’ correlations

  27. Another Approach to Variable Importance • The methods just provided give us a glimpse as to variable importance, but interestingly we don’t have a unique contribution statistic that is a true decomposition of R-squared, i.e. that we could add each measure of importance to equal our overall R-squared • One that does provides an average R2 increase, depending on the order the variable enters into the model • 3 predictor example A B C; B A C, C A B etc. • One way to think about it using what you’ve just learned is thinking of the squared semi-partial correlation whether a variable is first second third etc. • Note that the average is for all possible permutations • E.g. the R-square contribution for B being first in the model includes B A C and B C A, both of which would of course be the same value • The following example comes from the survey data

  28. As Predictor 1: R2 = .629 Note there are 2 models in which war would be .629 As Predictor 2: R2 change = .639 and .087 As Predictor 3: R2 change = .098 There are 2 models in which war would be .098

  29. Interpretation • The average of these is the average contribution to R square for a particular variable over all possible orderings • In this case for war it is ~.36, i.e. on average, it increases R square 36% of variance accounted for • Furthermore, if we add up the average R-squared contribution for all three… • .36+.28+.01 = .65 • .65 is the R2 for the model

  30. R program example • library(relaimpo) • RegModel.1 <- lm(SOCIAL~BUSH+MTHABLTY+WAR, data=Dataset) • calc.relimp(RegModel.1, type = c("lmg", "last", "first", "betasq", "pratt")) • Output: LMG, is what we were just talking about. LMG stands for Lindemann, Merenda and Gold, authors who introduced it • Last is simply the squared semi-partial correlation • First is just the square of the simple bivariate correlation between predictor and DV • Beta square is the square of the beta coefficient with ‘all in’ • Pratt is the product of the standardized correlation and the simple bivariate correlation • It too will add up to the model R2 but is not recommended, one reason being that it can actually be negative lmg last first betasq pratt BUSH 0.278 0.005 0.551 0.024 0.116 MATH 0.012 0.016 0.009 0.016 0.012 WAR 0.363 0.098 0.629 0.439 0.526 *Note the relaimpo package is equipped to provide bootstrapped estimates

  31. Different Methods • Note that one’s assessment of relative importance may depend on the method • Much of the time those methods will largely agree, but they may not, so use multiple estimates to help you decide • One might go with the LMG typically as it is both intuitive and a decomposition of R2 lmg last first betasq pratt BUSH 0.278 0.005 0.551 0.024 0.116 MATH 0.012 0.016 0.009 0.016 0.012 WAR 0.363 0.098 0.629 0.439 0.526

  32. Relative Importance Summary • There are multiple ways to estimate a variable’s contribution to the model, and some may be better than others • A general approach: • Check simple bivariate relationships. • If you don’t see worthwhile correlations with the DV there you shouldn’t expect much from your results regarding the model • Check for outliers and compare with robust measures also • You may detect that some variables are so highly correlated that one is redundant • Statistical significance is not a useful means of assessing relative importance, nor is the raw coefficient • Standardized coefficients and partial correlations are a first step • Compare standardized to simple correlations as a check on possible suppression • Of typical output the semi-partial correlation is probably the more intuitive assessment • The LMG is also intuitive, and is a natural decomposition of R2, unlike the others

  33. Relative Importance Summary • One thing to keep in mind is that determining variable importance, while possible for a single sample, should not be overgeneralized • Variable orderings likely will change upon repeated sampling • E.g. while one might think that war and bush are better than math (it certainly makes theoretical sense), saying that either would be better than the other would be quite a stretch with just one sample • What you see in your sample is specific to it, and it would be wise to not make any bold claims without validation

  34. Regression Diagnostics • Of course all of the previous information would be relatively useless if we are not meeting our assumptions and/or have overly influential data points • In fact, you shouldn’t be really looking at the results unless you test assumptions and look for outliers, even though this requires running the analysis to begin with • Various tools are available for the detection of outliers • Classical methods • Standardized Residuals (ZRESID) • Studentized Residuals (SRESID) • Studentized Deleted Residuals (SDRESID) • Ways to think about outliers • Leverage • Discrepancy • Influence • Thinking ‘robustly’

  35. Regression Diagnostics • Standardized Residuals (ZRESID) • Standardized errors in prediction • Mean 0, Sd = std. error of estimate • To standardize, divide each residual by its s.e.e. • At best an initial indicator (e.g. the +2 rule of thumb), but because the case itself determines what the mean residual would be, almost useless • Studentized Residuals (SRESID) • Same thing but studentized residual recognizes that the error associated with predicting values far from the mean of X is larger than the error associated with predicting values closer to the mean of X • standard error is multiplied by a value that will allow the result to take this into account • Studentized Deleted Residuals (SDRESID) • Studentized in which the standard error is calculated with the case in question removed from the others

  36. Regression Diagnostics • Mahalanobis’ Distance • Mahalanobis distance is the distance of a case from the centroid of the remaining points (point where the means meet in n-dimensional space) • Cook’s Distance • Identifies an influential data point whether in terms of predictor or DV • A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients. • With larger (relative) values, excluding a case would change the coefficients substantially. • DfBeta • Change in the regression coefficient that results from the exclusion of a particular case • Note that you get DfBetas for each coefficient associated with the predictors

  37. Regression Diagnostics • Leverage assesses outliers among the IVs • Mahalanobis distance • Relatively high Mahalanobis suggests an outlier on one or more variables • Discrepancy • Measures the extent to which a case is in line with others • Influence • A product of leverage and discrepancy • How much would the coefficients change if the case were deleted? • Cook’s distance, dfBetas

  38. Outliers • Influence plots • With a couple measures of ‘outlierness’ we can construct a scatterplot to note especially problematic cases • After fitting a regression model in R-commander, i.e. running the analysis, this graph is available via point and click • Here we have what is actually a 3-d plot, with 2 outlier measures on the x and y axes (studentized residuals and ‘hat’ values, a measure of leverage) and a third in terms of the size of the circle (Cook’s distance) • For this example, case 35 appears to be a problem

  39. Outliers • It should be clear to interested readers whatever has been done to deal with outliers, • Applications such as S-plus, R, and even SAS and Stata (pretty much all but SPSS) provide methods of robust regression analysis, and would be preferred

  40. Summary: Outliers • No matter the analysis, some cases will be the ‘most extreme’. However, none may really qualify as being overly influential. • Whatever you do, always run some diagnostic analysis and do not ignore influential cases • It should be clear to interested readers whatever has been done to deal with outliers • As noted before, the best approach to dealing with outliers when they do occur is to run a robust regression with capable software

More Related