1 / 37

Statistics for the Social Sciences

Explore the principles of simple bi-variate and multiple regression analysis using the least-squares fit line and general linear model. Learn to interpret residual plots and use SPSS for regression analysis. Understand how to identify patterns in errors and evaluate model suitability with practical examples.

chargrove
Download Presentation

Statistics for the Social Sciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.

  2. Outline (for week) • Simple bi-variate regression, least-squares fit line • The general linear model • Residual plots • Using SPSS • Multiple regression • Comparing models, (?? Delta r2) • Using SPSS

  3. Y 6 5 4 3 2 1 X 1 2 3 4 5 6 From last time • Review of last time Y = intercept + slope(X) + error

  4. Y 6 5 4 3 2 1 X 1 2 3 4 5 6 From last time • The sum of the residuals should always equal 0. • The least squares regression line splits the data in half • Additionally, the residuals to be randomly distributed. • There should be no pattern to the residuals. • If there is a pattern, it may suggest that there is more than a simple linear relationship between the two variables.

  5. Seeing patterns in the error • Useful tools to examine the relationship even further. • These are basically scatterplots of the Residuals (often transformed into z-scores) against the Explanatory (X) variable(or sometimes against the Response variable) • Residual plots

  6. Seeing patterns in the error Residual plot Scatterplot • The scatter plot shows a nice linear relationship. • The residual plot shows that the residuals fall randomly above and below the line. Critically there doesn't seem to be a discernable pattern to the residuals.

  7. Seeing patterns in the error Residual plot Scatterplot • The residual plot shows that the residuals get larger as X increases. • This suggests that the variability around the line is not constant across values of X. • This is referred to as a violation of homogeniety of variance. • The scatter plot also shows a nice linear relationship.

  8. Seeing patterns in the error Residual plot Scatterplot • The scatter plot shows what may be a linear relationship. • The residual plot suggests that a non-linear relationship may be more appropriate (see how a curved pattern appears in the residual plot).

  9. Regression in SPSS • Variables (explanatory and response) are entered into columns • Each row is an unit of analysis (e.g., a person) • Using SPSS

  10. Regression in SPSS • Analyze: Regression, Linear

  11. Predictor variable into the Independent Variable field Regression in SPSS • Enter: • Predicted (criterion) variable into Dependent Variable field

  12. Slope (indep var name) • Intercept (constant) Regression in SPSS • The variables in the model • r • r2 • We’ll get back to these numbers in a few weeks • Unstandardized coefficients

  13.  (indep var name) Regression in SPSS • Recall that r = standardized  in bi-variate regression • Standardized coefficient

  14. Multiple Regression • Typically researchers are interested in predicting with more than one explanatory variable • In multiple regression, an additional predictor variable (or set of variables) is used to predict the residuals left over from the first predictor.

  15. Multiple Regression • Bi-variate regression prediction models Y = intercept + slope (X) + error

  16. “residual” “fit” Multiple Regression • Multiple regression prediction models • Bi-variate regression prediction models Y = intercept + slope (X) + error

  17. whatever variability is left over First Explanatory Variable Second Explanatory Variable Third Explanatory Variable Fourth Explanatory Variable Multiple Regression • Multiple regression prediction models

  18. whatever variability is left over First Explanatory Variable Second Explanatory Variable Third Explanatory Variable Fourth Explanatory Variable Multiple Regression • Predict test performance based on: • Study time • Test time • What you eat for breakfast • Hours of sleep

  19. versus versus Multiple Regression • Predict test performance based on: • Study time • Test time • What you eat for breakfast • Hours of sleep • Typically your analysis consists of testing multiple regression models to see which “fits” best (comparing r2s of the models) • For example:

  20. Response variable Total variability it test performance Total study time r = .6 Multiple Regression Model #1: Some co-variance between the two variables • If we know the total study time, we can predict 36% of the variance in testperformance R2 for Model = .36 64% variance unexplained

  21. Multiple Regression Model #2: Add test time to the model • Little co-variance between these test performance and test time • We can explain more the of variance in test performance R2 for Model = .49 Response variable Total variability it test performance Total study time r = .6 51% variance unexplained Test time r = .1

  22. Multiple Regression Model #3: No co-variance between these test performance and breakfast food • Not related, so we can NOT explain more the of variance in test performance R2 for Model = .49 Response variable Total variability it test performance breakfast r = .0 Total study time r = .6 51% variance unexplained Test time r = .1

  23. Multiple Regression Model #4: Some co-variance between these test performance and hours of sleep • We can explain more the of variance • But notice what happens with the overlap (covariation between explanatory variables), can’t just add r’s or r2’s R2 for Model = .60 Response variable Total variability it test performance breakfast r = .0 Total study time r = .6 40% variance unexplained Hrs of sleep r = .45 Test time r = .1

  24. Multiple Regression in SPSS Setup as before: Variables (explanatory and response) are entered into columns • A couple of different ways to use SPSS to compare different models

  25. Regression in SPSS • Analyze: Regression, Linear

  26. Predicted (criterion) variable into Dependent Variable field • All of the predictor variables into the Independent Variable field Multiple Regression in SPSS • Method 1:enter all the explanatory variables together • Enter:

  27. Multiple Regression in SPSS • The variables in the model • r for the entire model • r2 for the entire model • Unstandardized coefficients • Coefficient for var1 (var name) • Coefficient for var2 (var name)

  28. Coefficient for var1 (var name) • Coefficient for var2 (var name) Multiple Regression in SPSS • The variables in the model • r for the entire model • r2 for the entire model • Standardized coefficients

  29. Multiple Regression • Which  to use, standardized or unstandardized? • Unstandardized ’s are easier to use if you want to predict a raw score based on raw scores (no z-scores needed). • Standardized ’s are nice to directly compare which variable is most “important” in the equation

  30. First Predictor variable into the Independent Variable field • Click the Next button Multiple Regression in SPSS • Method 2: enter first model, then add another variable for second model, etc. • Enter: • Predicted (criterion) variable into Dependent Variable field

  31. Second Predictor variable into the Independent Variable field • Click Statistics Multiple Regression in SPSS • Method 2 cont: • Enter:

  32. Multiple Regression in SPSS • Click the ‘R squared change’ box

  33. Multiple Regression in SPSS • Shows the results of two models • The variables in the first model (math SAT) • The variables in the second model (math and verbal SAT)

  34. Multiple Regression in SPSS • Shows the results of two models • The variables in the first model (math SAT) • The variables in the second model (math and verbal SAT) • r2 for the first model • Model 1 • Coefficients for var1 (var name)

  35. Coefficients for var1 (var name) • Coefficients for var2 (var name) Multiple Regression in SPSS • Shows the results of two models • The variables in the first model (math SAT) • The variables in the second model (math and verbal SAT) • r2 for the second model • Model 2

  36. Multiple Regression in SPSS • Shows the results of two models • The variables in the first model (math SAT) • The variables in the second model (math and verbal SAT) • Change statistics: is the change in r2 from Model 1 to Model 2 statistically significant?

  37. Cautions in Multiple Regression • We can use as many predictors as we wish but we should be careful not to use more predictors than is warranted. • Simpler models are more likely to generalize to other samples. • If you use as many predictors as you have participants in your study, you can predict 100% of the variance. Although this may seem like a good thing, it is unlikely that your results would generalize to any other sample and thus they are not valid. • You probably should have at least 10 participants per predictor variable (and probably should aim for about 30).

More Related