550 likes | 1k Views
General Linear Models. The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests and ANOVA’s.
E N D
General Linear Models • The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests and ANOVA’s. • General linear models become even more useful when our analysis includes both numeric (interval level) and categorical variables (nominal level), since both can directly be entered into the analysis, and SPSS will do any needed dummy coding. • In this example, we will demonstrate the equivalence of regression and ANOVA. We will use the SPSS General Linear Models procedure for a variety of tests in the future.
Homework problems: One-way Analysis of Variance – Specific Relationship Tested This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic? Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in the middle class (M = 13.83, SD = 3.14). • True • True with caution • False • Incorrect application of a statistic In the PowerPoint for One-Way ANOVA, we solved this problem, using SPSS’ One-Way ANOVA command. Applying the theory of general linear models, we will solve this problem with linear regression.
Converting the One-Way ANOVA problem to a Regression problem To solve this problem with regression, we need to dummy code the independent variable. Since the problem includes, a specific comparison, we need to select the reference group that makes this comparison possible. This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic? Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in the middle class (M = 13.83, SD = 3.14). • True • True with caution • False • Incorrect application of a statistic Specifically, we will use the working class category as the reference group, so that we can compare the difference between the middle class and the working class. We could just as easily have chose the middle class as the reference category.
Coding scheme for new variables The coding scheme for the new variables in shown in the table below. The class variable contained the four categories in the first column. We will create three new dichotomous variables: lowerClass, middleClass, and upperClass. Each new variable will have a 1 in the matching category from the original variable and zeros for all of the other categories.
Using Recoding in SPSS to Create New Variables Select the Recode > Into Different Variables command from the Transform menu.
Creating the lowerClass variable Second, type in the name for the new variable. First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box. Third, click on the Change button to replace the ? with this new variable name.
Assigning values to new variable Next, click on the Old and New Values button to assign values to the new variable.
Preserving missing values First, mark the System- or user-missing option button on the Old Value panel. Second, mark the System-missing option button on the New Value panel. Third, click on the Add button to include this recoding for the variable If we forget to explicitly assign missing values, cases with missing data will be recoded with a 0 and become part of the reference group.
Coding the lowerClass category First, to recode the 1 = lower class category to the dummy variable, mark the Value option button and type a 1 in the text box on the Old Value panel. Second, mark the Value option button and type a 1 in the text box on the New Value panel. This coding says: if they were originally in the lower class category, they are assigned a value of 1 for the lowerClass dummy variable. Third, click on the Add button to include this recoding for the variable
Coding the other categories Second, mark the Value option button and type a 0 in the text box on the New Value panel. This coding says: if they were originally NOT in the lower class category, they are assigned a value of 0 for the lowerClass dummy variable. First, to identify subjects in the categories other than lower class, mark the All other values option button on the Old Value panel. Third, click on the Add button to include this recoding for the variable
Completing the recoding When we have completed the coding for the new variable, click on the Continue button.
Completing the lowerClass variable Click on the OK button to create the new variable in the data editor.
Dummy variable coding for middleClass variable Following the same steps, we create the dummy variable for subjects who were 3 = middle class on the original class variable. The coding is similar to that for married subjects, except the category that was originally coded 3 = middle class is translated into a 1 on the new variable.
Dummy variable coding for upperClass variable Following the same steps, we create the dummy variable for subjects who were 4 = upper class on the original class variable. The coding is similar to that for married subjects, except the category that was originally coded 4 = upper class is translated into a 1 on the new variable.
Dummy-coded variables for class - 1 Subjects with a code value of 3 on the original class variable now have a 1 for middleClass and a 0 for the other new variables. Subjects with a code value of 2 on the original class variable now have a 0 for all the new variables.
Dummy-coded variables for class - 2 Subjects with a code value of 1 on the original class variable now have a 1 for lowerClass and a 0 for the other new variables. Subjects with a code value of 4 on the original class variable now have a 1 for upperClass and a 0 for the other new variables. Since it is very easy to make a mistake in recoding, it is imperative that we check the results of our recoding.
Regression of education on class variables - 1 Select the Regression > Linear command from the Analyze menu.
Regression of education on class variables - 2 First, we move the dependent variable to the Dependent Variable text box. Third, click on the OK button to produce the output. Second, we move the three dummy coded variables to the list of Independents.
Results of regression of education on class variables – overall relationship The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01).
Comparison to One-way ANOVA of education by class – overall relationship The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01). Moreover, all of the statistical values in the ANOVA table are identical to the results from regression.
Results of regression of education on class variables – individual relationships The tests of individual relationships are a comparison each group to the reference group. The difference between the middle class group and the working group is statistically significant.
Results of regression of education on class variables – individual relationships B coefficients are interpreted as the increase or decrease in the estimate of the dependent variable associated with the change from the reference group to the dummy-coded group. Subjects in the middle class had, on average, 1.249 more years of education than the working class.
Comparison to One-way ANOVA of education by class – individual relationship In the post hoc test, the difference between the middle class and the working class was also 1.249 years of education, and was a statistically significant relationship.
Comparison to One-way ANOVA of education by class – individual relationship However, the calculations for the post hoc test are completely different from the test of the b coefficient in the regression, which is reasonable since they are very different tests. The test of the b coefficient is a test of the hypothesis that b is not equal to 0. Post hoc tests are not hypothesis tests. The only hypothesis tested in the One-Way ANOVA was that one of the group means was different from the others. The post hoc test provided additional information about the differences, but it is not a hypothesis test because no hypothesis test was specified in advance of the statistical calculations. The significance of the test of the b coefficient was .001, while the significance of the post hoc test was .005. In this example we would make a similar interpretation, but that is not always the case.
Using linear contrasts to test specific group hypotheses - 1 • It is possible to include a hypothesis test of differences between specific groups within the one-way ANOVA, using linear contrasts. • Using the notation from the text, we would specify the linear contrast as the difference between the working class and the middle class. Since the problem indicated that middle class respondents had more education than working class respondents, we would write the contrast as: l = μmiddle class – μworking class where l is a linear contrast and μ’s are group means
Using linear contrasts to test specific group hypotheses - 2 • If we explicitly include coefficients for the population means in the contrast equation l = μmiddle class – μworking class becomes l = +1 × μmiddle class –1 × μworking class and if we add in the means for the other groups l = +1 × μmiddle class –1 × μworking class +0 × μlower class +0 × μupper class which is the contrast we will enter into SPSS
Testing a hypothesis comparing groupswithin One-Way ANOVA - 1 Select the Compare Means > One-Way ANOVA command from the Analyze menu.
Testing a hypothesis comparing groupswithin One-Way ANOVA - 2 First, move the dependent variable educ and the independent variable class into the list boxes. Second, click on the Contrasts button to add the linear contrast.
Testing a hypothesis comparing groupswithin One-Way ANOVA - 3 • The contrast coefficients were: • 0 for lower class • -1 for working class • +1 for middle class • 0 for upper class The contrasts must be entered in the same order that the variable is coded, i.e. from low to high codes for categories. First, type the contrast coefficient for the lower class group, 0, into the Coefficients text box. Second, click on the Add button to add the coefficent to the list box.
Testing a hypothesis comparing groupswithin One-Way ANOVA - 1 Add the contrast coefficients for the working class (-1), the middle class (+1), and the upper class (0) to the list box. Click on the Continue button to close the dialog box.
Testing a hypothesis comparing groupswithin One-Way ANOVA - 5 Click on the OK button to request the output.
Testing a hypothesis comparing groupswithin One-Way ANOVA - 6 The value and significance of the F-test are identical to the results obtained in the regression, as well as the one-way ANOVA with the post hoc tests. Moreover, the results for the contrast test match the test of the b coefficient in the regression analysis (β(264) =3.372, p < .01)
SPSS’ general linear models procedure • SPSS has a command for directly computing general linear models that is much more versatile that the regression command that we just used. The procedure contains options and diagnostic statistics that are not available in its linear regression command. • The default for group comparisons with this command is to compute contrasts with group with the highest numeric code. Since we want the comparison to be with the working class group, we will first change the numeric code for the group from 2 to 5 so that it is the highest numeric value.
Recoding the class variable - 1 To change the numeric coding for the working category so it is the highest numeric value, we again select Recode > Into Different Variables command from the Transform variable.
Recoding the class variable - 2 Second, type in the name for the new variable. First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box. Third, click on the Change button to replace the ? with this new variable name.
Recoding the class variable - 3 Next, click on the Old and New Values button to assign values to the new variable.
Recoding the class variable - 4 First, mark the System- or user-missing option button on the Old Value panel. Second, mark the System-missing option button on the New Value panel. Third, click on the Add button to include this recoding for the variable
Recoding the class variable - 5 First, to recode the 2 = working class category to the dummy variable, mark the Value option button and type a 2 in the text box on the Old Value panel. Second, mark the Value option button and type a 5 in the text box on the New Value panel. This coding says: if they were originally in the working class category, they are assigned a value of 5 for the new variable. Third, click on the Add button to include this recoding for the variable
Recoding the class variable - 5 Second, mark the Copy old values option button to retain the codes for the remaining groups. First, since we want all of the other codes to remain the same, we click on the All other values option button. Third, click on the Add button to include this recoding for the variable
Recoding the class variable - 6 When we have completed the coding for the new variable, click on the Continue button.
Recoding the class variable - 7 Click on the OK button to create the new variable in the data editor.
Recoding the class variable - 8 We check the values in the data editor to make sure the recode worked as anticipated. In this example, we see that the 2’s for class are correctly recoded as 5’s.
Using SPSS’ general linear models - 1 To solve the problem using SPSS’ General Linear Model command, select General Linear Model > Univariate from the Analyze menu. The univariate command indicates that we have a single dependent variable.
Using SPSS’ general linear models - 2 First, we move the dependent variable to the Dependent Variable text box. Second, we move the newly created independent variable to the Fixed Factors list box. Fixed factors are those for which all possible codes are represented in the data set. Random Factors are categorical variables which can take on values different from those in our data set. Third, click on the Options button to specify additional output. While the univariate GLM command has numerous specifications, we only need one request for this problem. Covariates are interval level variables or variables we wish to treat as interval level.
Using SPSS’ general linear models - 3 Second, click on the Continue button to close the dialog box. First, mark the check box for Parameter estimates. This will compute and test the coefficients.
Using SPSS’ general linear models - 4 Click on the OK button to produce the output.
SPSS’ general linear models output The value and significance of the F-test are identical to the results obtained in the regressionand the one-way ANOVA with the post hoc tests. Subjects in the middle class (code 3) had, on average, 1.249 more years of education than the working class. The difference is statistically significant and identical to the findings from the other comparisons, (β(264) =3.372, p < .01)