260 likes | 407 Views
Multiple Regression. 1. Last Part – Model Reduction. Model Reduction. More of an art form than a science As stated earlier, we’re trying to create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time: Meets the assumptions of MLR
E N D
Multiple Regression 1 Last Part – Model Reduction
Model Reduction • More of an art form than a science • As stated earlier, we’re trying to create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time: • Meets the assumptions of MLR • Best manages the other aforementioned issues – sample size, outliers, multicollinearity • Is parsimonious 1 2
Model Reduction • The more variables, the higher the R2; conversely, our R2 will decrease every time we remove a variable from the model • So, if we’re reducing our R2, we want to make sure that we’re making progress relative to the assumptions, sample size, multicollinearity, & parsimony 1
MLR Model Reduction Example • Let’s use the March Madness homework data • RQ: How effectively can we create a model to predict teams’ success in the NCAA Division I Men’s Basketball Tournament? • Alpha=.05 a priori for all tests 1 2
MLR Model Reduction Example • Use SPSS to find the Cook’s distance for the data: 1 2 2
MLR Model Reduction Example • Output from Cook’s distance request: The largest Cook’s distance is smaller than 1, so no problem (CD of > 1 signifies a problem with an influential data point, so you should consider eliminating it) 1
MLR Model Reduction Example • Output from Cook’s distance request (data file: 1
MLR Model Reduction Example • Examine the correlation matrix to see which variables are correlating with the DV and for multicollinearity among IV’s • Matrix on next slide • Correlations above .5 are somewhat concerning…those above .7 or particularly .8 are larger concerns • I count eight pairwise correlations (not involving the DV) that are .7+ 1
MLR Model Reduction Example • What does this tell us? 1
MLR Model Reduction Example • What does this tell us? 1
Sample size concerns • Recall: • Tabachnick & Fidell (1996): n > 50 + 8k • Here k (# predictors) = 13 • n = 192 • 50 + (8 * 13) = 50 + 104 = 154 • So the inequality is satisfied here • Could still be improved by losing some predictors 1
MLR Model Reduction Example • Am I satisfied with this model, or should I examine another model by reducing via IV elimination? • Because of some serious multicollinearity problems, it seems we can create a “better” model via reduction 1
MLR Model Reduction Example 1 • So, what variables do we drop? • In examining variables to drop, look at: • Pairwise correlation with the DV ( good) • Multicollinearity with other IV’s ( good) • Prediction strength in model (ideal to have no non-significant IV’s in model) ( good) • Common sense – make your decisions based on BOTH statistical and practical grounds 2 4 3 5 6 This is an important slide, folks!
MLR Model Reduction Example • Wins, losses, and winning % are all obviously highly correlated with one another – of the three, wins has the highest pairwise correlation w/ the DV and the highest t-score of the three in the model, so let’s keep it but drop the other two 1 2
Example – Model #2 • So, let’s re-run the analysis without those 2 variables & see what we get… 1
Example – Model #2 • Compare from one model to the next: • R2 • F-statistic • IV’s in model • So, how did we do? • Happy with this model? 1 2 3
Example – Model #3 1 • Let’s try to clear up a couple of other multicollinearity problems: • Top 50 wins vs. Top 50 win % • Strength of schedule vs. RPI vs. Conference membership • Let’s drop Top 50 win % & SOS • Also, let’s get rid of # of wins in last ten games and Top 50 losses as they haven’t been significant anywhere 2 3 4
Example – Model #3 • Model #3… 1
Example – Model #3/4 1 • How did we do this time? • A fourth model should perhaps get rid of automatic bid & conference affiliation
MLR Model Reduction • As you can see, this trial-and-error process can continue at some length • The goal is to create a highly predictive, parsimonious model with as few problems with assumptions & multicollinearity as possible 1 Finis…