250 likes | 851 Views
Incorporating Nonmetric Data with Dummy Variables. The logic of dummy-coding Dummy-coding in SPSS. Dummy-coding variables.
E N D
Incorporating Nonmetric Data with Dummy Variables The logic of dummy-coding Dummy-coding in SPSS
Dummy-coding variables • For many of the multivariate techniques we will study, it is assumed that the independent or dependent variables in the analysis are metric variables. If we have a nonmetric, or categorical, variable we can incorporate it into our analysis by converting the categorical variable to a set of dichotomous, dummy-coded variables. • A dichotomous variable arguably satisfies the interval level of measurement. On some construct, one of the categories represents more or less of the construct, so the definition of ordinal data is satisfied. Moreover, since there are only two categories, the unit of measure between them must be equal for all categories, satisfying the definition of interval data.
Selecting a reference group • To dummy-code a variable, we first identify one category or subgroup of the nonmetric variable as the reference or comparison group. • The effects which we identify in our analysis will be differences from the reference or comparison group. • For example, suppose that I were contrasting salaries of men and women in some group of employees and I was interested in how women's salaries differed from men's salaries. Assuming there is a "gender" variable in my data set coded either "male" or "female," I would select "male" as the reference category on the nonmetric "gender" variable.
A two-category example - 1 • After we have identified the reference category, we create a new variable for each of the remaining categories or subgroups of the nonmetric variable. • Thus, a nonmetric variable will be represented in the analysis by a number of new dichotomous variables equal to one less than the number of categories in the original nonmetric variable. • In the salary example given above, with "male" selected as my reference group, the remaining group on the nonmetric gender variable is "female," so I create a new variable called "women." (I usually would name it "female" but I don't want to have to have two entities with the same name in this example.)
A two-category example - 2 • Finally, I code the new variable with one of two dichotomous values, usually 1 and 0. • The new variable is assigned a 1 if the original variable indicated membership in the category represented by the new variable. • If the subject was not a member of the category designated by the new variable, the new variable is coded 0 for that subject. • In the example above, if a subject was in the "female" group of the "gender" variable, her code for the new "women" variable is 1. If a subject was not in the "female" group of the "gender" variable, his code for the new "women" variable is 0.
A three-category example - 1 • If the original nonmetric variable had three or more categories, we would create two or more new variables and code them with the same scheme. • Suppose for example, that we have a variable for political identification, named "partyid" which contains three values for "Republican," "Democrat," and "Independent." I select "Independent" as my reference category because I am interested in the effect of being a Republican or a Democrat. • Dummy-coding requires that I create and code two new variables, one for "Republican" which I will name "Repub" and one for "Democrat" which I will name "Demo."
A three-category example - 2 • Each subject in the data set will be assigned a value for both the new variables, "Repub" and "Demo," using the following scheme: • If a subject is a "Republican" on the original "partyid" variable, they are assigned a value of 1 for the new "Repub" variable and a value of 0 for the new "Demo" variable. • If a subject is a "Democrat" on the original "partyid" variable, they are assigned a value of 0 for the new "Repub" variable and a value of 1 for the new "Demo" variable. • If a subject is an "Independent" on the original "partyid" variable, they are assigned a value of 0 for the new "Repub" variable and a value of 0 for the new "Demo" variable, because they are not Republican and they are not Democrat.
Example in SPSS • In GSS2000R, the variable marital contains five categories: married, widowed, divorce, separated, and never married. • Assuming my research question dealt with marital experiences, the never married category is selected as the reference category. • We will create four other variables to represent each of the other marital experiences, with each variable representing one experience. The variables will be married, widowed, divorced, and separatd (using the 8 allowable characters for SPSS variable names).
Coding scheme for new variables The coding scheme for the new variables in shown in the table below.
Using Recoding in SPSS to Create New Variables Select the Recode | Into Different Variables command from the Transform menu.
Creating the married variable First, select the variable to be dummy-coded, marital, from the list of variables and move it to the Numeric Variable -> Output Variable list box. Second, type in the name for the new variable and click on the Change button to replace the ? with this new variable name.
Assigning values to new variable Next, click on the Old and New Values button to assign values to the new variable.
Preserving missing values First, mark the System- or user-missing option button on the Old Value panel. Second, mark the System-missing option button on the New Value panel. Third, click on the Add button to include this recoding for the variable If we forget to explicitly assign missing values, cases with missing data will be recoded with a 0 and become part of the reference group.
Coding the married category First, to recode the 1 = married category to the dummy variable, mark the Value option button and type a 1 in the text box on the Old Value panel. Second, mark the Value option button and type a 1 in the text box on the New Value panel. This coding says: if they were originally in the married category for marital, they are assigned a value of 1 for the married dummy variable. Third, click on the Add button to include this recoding for the variable
Coding the other categories Second, mark the Value option button and type a 0 in the text box on the New Value panel. This coding says: if they were originally NOT in the married category for marital, they are assigned a value of 0 for the married dummy variable. First, to identify subjects in the categories other than married, mark the All other values option button on the Old Value panel. Third, click on the Add button to include this recoding for the variable
Completing the re-coding When we have completed the coding for the new variable, click on the Continue button.
Completing the married variable Click on the OK button to create the new variable in the data editor.
Variable and coding for widowed variable Following the same steps, we create the dummy variable for subjects who were 2 = widowed on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 2 = widowed is translated into a 1 on the new variable.
Variable and coding for divorced variable Following the same steps, we create the dummy variable for subjects who were 3 = divorced on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 3 = divorced is translated into a 1 on the new variable.
Variable and coding for separated variable Following the same steps, we create the dummy variable for subjects who were 4 = separated on the original marital variable. The coding is similar to that for married subjects, except the category that was originally coded 4 = separated is translated into a 1 on the new variable.
Dummy-coded variables for married subjects Subjects with a code value of 1 = married on the original marital variable now have a 1 for married and a 0 for the other new variables.
Dummy-coded variables for widowed subjects Subjects with a code value of 2 = widowed on the original marital variable now have a 1 for widowed and a 0 for the other new variables.
Dummy-coded variables for divorced subjects Subjects with a code value of 3 = divorced on the original marital variable now have a 1 for divorced and a 0 for the other new variables.
Dummy-coded variables for never married subjects Subjects with a code value of 5 = never married on the original marital variable now have a 0 for all new variables. This was the reference category.