200 likes | 713 Views
Dummy Variables. Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure.
E N D
Dummy Variables • Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure. • The term “dummy” appears to refer to the fact that the presence of the trait indicated by the code of 1 represents a factor or collection of factors that are not measurable by any better means within the context of the analysis.
Coding of dummy Variables • Take for instance the race of the respondent in a study of voter preferences • Race coded white(0) or black(1) • There are a whole set of factors that are possibly different, or even likely to be different, between voters of different races • Income, socialization, experience of racial discrimination, attitudes toward a variety of social issues, feelings of political efficacy, etc. • Since we cannot measure all of those differences within the confines of the study we are doing, we use a dummy variable to capture these effects.
Multiple categories • Now picture race coded white(0), black(1), Hispanic(2), Asian(3) and Native American(4) • If we put the variable race into a regression equation, the results will be nonsense since the coding implicitly required in regression assumes at least ordinal level data – with approximately equal differences between ordinal categories. • Regression using a 3 (or more) category nominal variable yields un-interpretable and meaningless results.
Creating Dummy variables • The simple case of race is already coded correctly • Black: coded 0 for white and 1 for black • Note the coding can be reversed and leads only to changes in sign and direction of interpretation. • The complex nominal version turns into 5 variables: • White; coded 1 for whites and 0 for non-whites • Black; coded 1 for blacks and 0 for non-blacks • Hispanic; coded 1 for Hispanics and 0 for non- Hispanics • Asian; coded 1 for Asians and 0 for non- Asians • AmInd; coded 1 for native Americans and 0 for non-native Americans
Regression with Dummy Variables • The dummy variable is then added the regression model • Interpretation of the dummy variable is usually quite straightforward. • The intercept term represents the intercept for the omitted category • The slope coefficient for the dummy variable represents the change in the intercept for the category coded 1 (blacks)
Regression with only a dummy • When we regress a variable on only the dummy variable, we obtain the estimates for the means of the depended variable. • a is the mean of Y for Whites and a+B1 is the mean of Y for Blacks.
Omitting a category • When we have a single dummy variable, we have information for both categories in the model • Also note that White = 1 – Black • Thus having both a dummy for White and one for Blacks is redundant. • As a result of this, we always omit one category, whose intercept is the model’s intercept. • This omitted category is called the reference category • In the dichotomous case, the reference category is simply the category coded 0 • When we have a series of dummies, you can see that the reference category is also the omitted variable.
Suggestions for selecting the reference category • Make it a well defined group – ‘other’ or an obscure one (low n) is usually a poor choice. • If there is some underlying ordinality in the categories, select the highest or lowest category as the reference. (e.g. blue-collar, white-collar, professional) • It should have ample number of cases. The modal category is also often a good choice.
Multiple dummy Variables • The model for the full dummy variable scheme for race is: • Note that the dummy for White has been omitted, and the intercept a is the intercept for Whites.
Tests of Significance • With dummy variables, the t tests test whether the coefficient is different from the reference category, not whether it is different from 0. • Thus if a = 50, and B1 = -45, the coefficient for Blacks might not be significantly different from 0, while Whites are significantly different from 0
Interaction terms • When the research hypotheses state that different categories may have differing responses to other independent variables, we need to use interaction terms. • For example, race and income interact with each other so that the relationship between income and ideology is different (stronger or weaker) for Whites than Blacks.
Creating Interaction terms • To create an interaction term is easy • Multiply the category * the independent variable • The full model is thus: • a is the intercept for Whites; • (a + B1) is the intercept for Blacks; • B2 is the slope for Whites; and • (B2 + B3) is the slope for Blacks • t-tests for B1 and B3 are whether they are different than a and B2
Separating Effects • The literature is unclear on how to fully interpret interaction effects • There is multicolinearity between a dummy and its interaction terms, and also the regular independent variable • It is suggested that you do not use a model with Interactions terms and no intercept!