1 / 11

Extension

Extension. The General Linear Model with Categorical Predictors. Extension. Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups For now we will concern ourselves with the two independent groups case

palmer
Download Presentation

Extension

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extension The General Linear Model with Categorical Predictors

  2. Extension • Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups • For now we will concern ourselves with the two independent groups case • E.g. gender, republican vs. democrat etc.

  3. Dummy coding • There are different ways to code categorical data for regression, and in general, to represent a categorical variable you need k-1 coded variables1 • k = number of categories/groups • Dummy coding involves using zeros and ones to identify group membership, and since we only have two groups, one group will be zero (the reference group) and the other 1

  4. Dummy coding Group Outcome 0 3 0 5 0 7 0 2 0 3 1 6 1 7 1 7 1 8 1 9 • Example • The thing to note at this point is that we have a simple bivariate correlation/simple regression setting • The correlation between group and the DV is .76 • This is sometimes referred to as the point biserial correlation (rpb) because of the categorical variable • However, don’t be fooled, it is calculated exactly the same way as the Pearson before i.e. you treat that 0,1 grouping variable like any other in calculating the correlation coefficient • However, the sign is arbitrary since either group could have been a one or zero, and so that needs to be noted

  5. Example Graphical display The R-square is .762 = .577 The regression equation is

  6. Example Look closely at the descriptive output compared to the coefficients. What do you see?

  7. The constant • Note again our regression equation • Recall the definition for the slope and constant • First the constant, what does “when X = O” mean here in this setting? • It means when we are in the O group • What is that predicted value? • Ypred = 4 + 3.4(0) = 4 • That is the group’s mean • The constant here is thus the reference group’s mean

  8. The coefficient Now think about the slope What does a ‘1 unit change in X’ mean in this setting? It means we go from one group to the other Based on that coefficient, what does the slope represent in this case (i.e. can you derive that coefficient from the descriptive stats in some way?) The coefficient is the difference between means

  9. The regression line • The regression line covers the values represented • i.e. 0, 1, for the two groups • It passes through each of their means • Using least squares regression the regression line always passes through the mean of X and Y, though the mean of X here is nonsensical • The constant (if we are using dummy coding) is the mean for the zero (reference) group • The coefficient is the difference between means

  10. Furthermore, the previous gives the same results we would have gotten via a t-test, to which we are about to turn, However, you now can see it is not a distinct procedure from regression with a linear model of some outcome predicted by a grouping variable. Two Sample t-test data: Outcome by Group t = 3.3024, df = 8, p-value = 0.01082 95 percent confidence interval: 5.774177 1.025823 Comparison to the t-test

  11. Summary • Understanding the basics regarding the general linear model can go a long way toward one’s ability to understand any analysis • It not only specifically holds here but is utilized in more complex univariate and multivariate analyses, and even in some nonlinear situations (e.g. logistic regression), we use ‘generalized’ linear models • Y = Xb + e • For properly specified models, linear models provide reasonable fits and an intuitive understanding relative to more complex approaches.

More Related