1 / 45

Addendum

Addendum. Testing assumptions of simple linear regression. 1. Now, how does one go about it?. The approach taken in this course will be to teach you to control a In other words, teach cautious ways to go about your business, so that if you get a result you can interpret it appropriately

oro
Download Presentation

Addendum

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Addendum Testing assumptions of simple linear regression 1

  2. Now, how does one go about it? • The approach taken in this course will be to teach you to control a • In other words, teach cautious ways to go about your business, so that if you get a result you can interpret it appropriately • This requires that you know what to do to protect a... • ...and that means testing the assumptions of the procedure...and knowing what happens to a if they are violated 1 2

  3. Now, how does one go about it? • And just as a “by the way”... • There’s lots of slides in here that we’ll “flash” by...but they provide a real step by step guide to completing some basic tests in the mid-term, so please be aware that the information is here! 1

  4. Testing assumptions of regression - 1 • Measurement level • Independent must be interval or dichotomous • Dependent must be interval • How to test? • You already know • If condition violated? • Don’t use regression! 1

  5. Testing the assumptions for regression - 2 • Normality (interval level variables) • Skewness & Kurtosis must lie within acceptable limits (-1 to +1) • How to test? • You can examine a histogram, but SPSS also provides procedures, and these have convenient rules that can be applied (see following slides) • If condition violated? • Regression procedure can overestimate significance, so should add a note of caution to the interpretation of results (increases type I error rate) 1 2 3

  6. Testing the assumptions - normality 1 To compute skewness and kurtosis for the included cases, select Descriptive Statistics|Descriptives… from the Analyze menu.

  7. Testing the assumptions - normality First, move the variables to the Variable(s) list box. In this case there are two interval variables (the IV and the DV) 1 Second, click on the Options… button to specify the statistics to compute.

  8. Testing the assumptions - normality 1 Second, click on the Continue button to complete the options. First, mark the checkboxes for Kurtosis and Skewness.

  9. Testing the assumptions - normality Click on the OK button to indicate the request for statistics is complete. 1

  10. SPSS output to evaluate normality The simple linear regression requires that the interval level variables in the analysis be normally distributed. The skewness of NUMBER OF HOURS WORKED LAST WEEK for the sample (-0.333) is within the acceptable range for normality (-1.0 to +1.0) , but the kurtosis (1.007) is outside the range. The assumption of normality is not satisfied for NUMBER OF HOURS WORKED LAST WEEK. The skewness of RS OCCUPATIONAL PRESTIGE SCORE (1980) for the sample (0.359) is within the acceptable range for normality (-1.0 to +1.0) and the kurtosis (-0.692) is within the range. The assumption of normality is satisfiedfor RS OCCUPATIONAL PRESTIGE SCORE (1980). 1 2 The assumption of normality required by the simple linear regression is not satisfied. A note of caution should be added to any findings based on this analysis.

  11. Testing the assumptions – 3 • Linearity & homoscedasticity for interval level variables • How to test? • Scatterplot (see following slides) • If condition violated? • Can underestimate significance – loses power, increases possibility of type II error 1

  12. Testing the assumptions – linearity and homoscedasticity First, select the chart builder. 2 2. Second, choose scatter/dot from the chart gallery 1 3

  13. Testing the assumptions – linearity and homoscedasticity 1

  14. The scatterplot for evaluating linearity The simple linear regression assumes that the relationship between the independent variable RS OCCUPATIONAL PRESTIGE SCORE (1980)" and the dependent variable "NUMBER OF HOURS WORKED LAST WEEK" is linear. The assumption is usually evaluated by visual inspection of the scatterplot. Violation of the linearity assumption may result in an understatement of the strength of the relationship between the variables. 1

  15. The scatterplot for evaluating linearity Linear – all is well with α Non-linear – will underestimate significance 1

  16. The scatterplot for evaluating homoscedasticity 1 The simple linear regression assumes that the range of the variance for the dependent variable is uniform for all values of the independent variable. For an interval level independent variable, the assumption is evaluated by visual inspection of the scatterplot of the two variables. Violation of the homogeneity assumption may result in an understatement of the strength of the relationship between the variables.

  17. The scatterplot for evaluating homoscedasticity Homoscedastic – all is well with α Heteroscedastic – will underestimate significance 1

  18. Testing the assumptions for simple regression – 4 • Linearity & homoscedasticity for a dichotomous independent variable • How to test? • Linearity –only 2 levels, so not relevant here (see next slide) • Homoscedasticity – via Levene’s test of homogeneity of variance in ANOVA (see following slides) • If condition violated? • Can underestimate significance – loses power, increases possibility of type II error 1

  19. Testing the assumptions – linearity for a dichotomous IV When the independent variable is dichotomous, we do not have a meaningful scatterplot that we can interpret for linearity. The assumption of a linear relationship between the independent and dependent variable is only tested when the independent variable is interval level. 1

  20. Testing the assumptions - homoscedasticity for a dichotomous IV 1 To conduct the test of homoscedasticity, we will use the One-Way ANOVA procedure. Select the command Compute Means | One-Way ANOVA … from the Analyze menu.

  21. Testing the assumptions - homoscedasticity for a dichotomous IV First, move the variable “prestg80” to to the Dependent list box. 1 Second, move the variable “compuse” to the Factor text box. Third, click on the Options… button to specify the statistics to compute.

  22. Testing the assumptions - homoscedasticity for a dichotomous IV 1 Second, click on the Continue button to complete the request. First, mark the Homogeneity-of-variance check box to request the Levene test.

  23. Testing the assumptions - homoscedasticity for a dichotomous IV 1 Click on the OK button to indicate the request for statistics is complete.

  24. Result of test of homoscedasticity for a dichotomous independent variable 1 The simple linear regression assumes that the variance for the dependent variable is uniform for all groups. This assumption is evaluated with Levene's test for equality of variances. The null hypothesis for this test states that the variances of all groups are equal. The desired outcome for this test is to fail to reject the null hypothesis. Since the probability associated with the Levene test (0.141) is greater than the level of significance (0.05), the null hypothesis is not rejected. The requirement for equal variances is satisfied.

  25. Assumptions tested – run the analysis • Now you’ve tested the assumptions, here’s a quick run through of how to run the test and how to interpret results • First – an example with two interval level variables 1

  26. Running the analysis – interval IV’s To conduct a simple linear regression, select the Regression | Linear… from the Analyze menu.

  27. Running the analysis – interval IV’s First, move the dependent variable “hrs1" to the text box for the Dependent variable. Third, click on the OK button to complete the request. Second, move the independent variable “prestg80" to the list of Independent variables.

  28. The existence of a relationship The determination of whether or not there is a relationship between the independent variable and the dependent variable is based on the significance of the regression in the ANOVA table. The probability of the F statistic for the regression relationship is 0.041, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the independent and the dependent variable.

  29. The strength of the relationship • The strength of the relationship is based on the R-square statistic, which is the square of the R, the correlation coefficient. • We evaluate the strength of the relationship using the rule of thumb for interpreting R: • Between 0 and ±0.20 - Very weak • Between ±0.20 and ±0.40 - Weak • Between ±0.40 and ±0.60 - Moderate • Between ±0.60 and ±0.80 - Strong • Between ±0.80 and ±1.00 - Very strong

  30. The direction of the relationship The direction of the relationship, direct or inverse, is based on the sign of the B coefficient for the independent variable. Since 0.138 is positive, there is a positive relationship between occupational prestige and hours worked.

  31. Interpret the intercept The intercept (Constant) is the position on the vertical y-axis where the regression line crosses the axis. It is interpreted as the value of the dependent variable when the value of the independent variable is zero. It is seldom a useful piece of information.

  32. Interpret the slope The B coefficient of the independent variable is called the slope. It represents the amount of change in the dependent variable for a one-unit change in the independent variable. Each time that occupational prestige increases or decreases by one point, we would expect the subject to work 0.138 more or 0.138 fewer hours.

  33. Significance test of the slope If there is no relationship between the variables, the slope would be zero. The hypothesis test of the slope tests the null hypothesis that the b coefficient, or slope, is zero. In simple linear regression, the significance of this test matches that of the overall test of relationship between dependent and independent variables. In multiple regression, the test of overall relationship will differ from the test of each individual independent variable.

  34. Conclusion of the analysis For the population represented by this sample, there is a very weak relationship between "RS OCCUPATIONAL PRESTIGE SCORE (1980)" and "NUMBER OF HOURS WORKED LAST WEEK." Specifically, we would expect a one unit increase in occupational prestige score to produce a 0.138 increase in number of hours worked in the past week. Because of the earlier problems stated with normality, the statistical conclusion must be expressed with caution.

  35. Running the analysis – mixed IV’s • Now an example with an interval dependent variable and a dichotomous independent variable...

  36. SPSS output to evaluate normality The simple linear requires that the interval level variables in the analysis be normally distributed. The skewness of RS OCCUPATIONAL PRESTIGE SCORE (1980) for the sample (0.324) is within the acceptable range for normality (-1.0 to +1.0) and the kurtosis (-0.817) is within the range. The assumption of normality is satisfied for RS OCCUPATIONAL PRESTIGE SCORE (1980).

  37. The strength of the relationship • The strength of the relationship is based on the R-square statistic in the Model Summary table of the regression output. R-square is the square of the R, the correlation coefficient. • We evaluate the strength of the relationship using the rule of thumb for interpreting R: • Between 0 and ±0.20 - Very weak • Between ±0.20 and ±0.40 - Weak • Between ±0.40 and ±0.60 - Moderate • Between ±0.60 and ±0.80 - Strong • Between ±0.80 and ±1.00 - Very strong

  38. The direction of the relationship The direction of the relationship, direct or inverse, is based on the sign of the B coefficient for the independent variable. Since -7.406 is negative, there is an inverse relationship between using a computer and occupational prestige. What this means exactly will depend on the way the computer use variable is coded.

  39. Interpret the intercept The intercept (Constant) is the position on the vertical y-axis where the regression line crosses the axis. It is interpreted as the value of the dependent variable when the value of the independent variable is zero. It is seldom a useful piece of information.

  40. Interpret the slope The b coefficient for the independent variable "R USE COMPUTER" is -7.406. The b coefficient is the amount of change in the dependent variable "RS OCCUPATIONAL PRESTIGE SCORE (1980)" associated with a one unit change in the independent variable. Since the independent variable is dichotomous, a one unit increase implies a change from the category YES(code value = 1) to the category NO(code value = 2).

  41. Significance test of the slope If there is no relationship between the variables, the slope would be zero. The hypothesis test of the slope tests the null hypothesis that the b coefficient, or slope, is zero. In simple linear regression, the significance of this test matches that of the overall test of relationship between dependent and independent variables. In multiple regression, the test of overall relationship will differ from the test of each individual independent variable.

  42. Conclusion... For the population represented by this sample, there is a weak relationship between "R USE COMPUTER" and "RS OCCUPATIONAL PRESTIGE SCORE (1980)." Specifically, we would expect survey respondents who used a compute to average 7.406 less for occupational prestige score than survey respondents who worked part-time. No problems with assumptions, so no need to express caution in this case.

  43. No No Yes Yes Simple linear regression chart - 1 1 The following is a guide to the decision process for answering simple linear regression questions. • Is the level of measurement okay? • Independent: interval or dichotomous • Dependent: interval Incorrect application of a statistic 2 • Is the assumption of normality satisfied? • Skewness, kurtosis of dependent variable: –1.0 to +1.0 Add caution if the question turns out to be true

  44. No No Yes Simple linear regression chart - 2 • Is the assumption of linearity satisfied? • Examine scatterplot Add caution if the question turns out to be true 1 • Is the assumption of homoscedasticity satisfied? • Levene test for dichotomous independent variable • Examine scatterplot for interval independent variable Add caution if the question turns out to be true Yes

  45. No Yes Simple linear regression chart - 3 Is the probability of the F for the regression relationship less than or equal to the level of significance? Fail to reject null hypothesis 1 Does the size and direction of the intercept and the slope agree with the problem statement? Fail to reject null hypothesis No Yes Reject null hypothesis

More Related