240 likes | 356 Views
Creating variables and specifying models to test for interactions between two categorical independent variables. Jane E. Miller, PhD. Overview. Creating variables for an interaction between two categorical variables Review: dummy variables Review: reference categories Aside on missing values
E N D
Creating variables and specifying models to test for interactions between two categorical independent variables Jane E. Miller, PhD
Overview • Creating variables for an interaction between two categorical variables • Review: dummy variables • Review: reference categories • Aside on missing values • Specifying a model with an interaction between two categorical variables
List of variables used in examples • Dependent variable = birth weight in grams (BW). • Independent variables: • Main effects terms: • Race • Two nominal categories (non-Hispanic black; non-Hispanic white is the reference category) • One main effect dummy variable: NHB • Coded 1 = non-Hispanic black, 0 = non-Hispanic white • Mother’s education • Three ordinal categories (<HS; =HS; >HS is the reference category) • Two main effects dummies: <HS, =HS • Each coded 1 = named category, 0 = all other values
List of variables, continued • Interaction between race and mother’s education • Two interaction term dummies: NHB_<HS; NHB_=HS • Each named using the “_” convention to link the names of the component variables. • Each coded 1 = named category, 0 = all other values • E.g., NHB_<HS = 1 for those who are both NHB and <HS, = 0 for all other combinations of race and education
Interaction between two categorical independent variables • Example: Race and education • Race is a 2-category independent variable classified • Non-Hispanic black (NHB) • Non-Hispanic white (NHW) = reference category • Mother’s educational attainment is a 3-category independent variable classified • Less than complete high school (<HS) • High school diploma, no higher (=HS) • More than high school (>HS) = reference category
Coding of variables • Each of the dummy (also known as “binary”) variables will be coded • 1 for each case that has the trait after which the variable is named. • 0 for all other cases. • E.g., the dummy variable “NHB” will be coded • 1 for all non-Hispanic black infants. • 0 for all others (in this example, all non-Hispanic white infants).
Reference category for an interaction • Need a set of independent variables to uniquely identify each possible combination of race and mother’s educational attainment. • With one 2-category variable and one 3-category variable, there are six such combinations. • Choose one category to be the basis of comparison. • The reference category. • Define dummy variables to differentiate among the other five categories.
Possible combinations of race and mother’s educational attainment
Source variables used to create main effects and interaction terms • Three source variables: • A two-category race variable RACE coded 1 = non-Hispanic white; 2 = non-Hispanic black • A three-category education variable MOMED coded 1 = <HS; 2 = “=HS”; 3 = >HS • A continuous income variable IPR, annual family income (in $) divided by the Federal Poverty Level for a family of that size and age composition • On the next few slides, • PINK = original (“source”) variable • YELLOW = main effect term • GREEN = interaction term
Coding of main effects and interaction terms: race/ethnicity and education For a two-category race variable (non-Hispanic white = reference category). And a three-category educational attainment variable (>HS = reference category).
Coding of main effects and interaction variables: non-Hispanic white infants For a two-category race variable (non-Hispanic white = reference category). And a three-category educational attainment variable (>HS = reference category).
Calculating an interaction term from two dummy main effects terms • Using the convention of naming the interaction term with an “_” to connect the names of the two component variables. • The interaction term between NHB and <HS is calculated NHB × <HS. • Since both component main effects terms are coded 1 for the named group and 0 for all others, only when both NHB and <HS = 1 is NHB_<HS = 1. • A value of 1 for that interaction term identifies infants with BOTH of those traits. • E.g., for an infant who is NHW and <HS we have 0 × 1 = 0.
Coding of main effects and interaction variables: non-Hispanic black infants For a two-category race variable (non-Hispanic white = reference category). And a three-category educational attainment variable (>HS = reference category).
Coding of main effects and interaction variables: race and educational attainment For a two-category race variable (non-Hispanic white = reference category). And a three-category educational attainment variable (>HS = reference category).
Aside: Missing values • For each new variable created, the new variable should take on a missing value if the original source variable was missing for a given case. • Need to specify this as an extra step for IF/THEN logic such as that used in creating the dummies. • E.g., IF RACE = . THEN NHB =.; • In the statistical package SAS, “.” is the code for missing. • For variables created using arithmetic, if any component source variable is missing, the result of the calculation will also be missing. • E.g., if IPR =., then IPR_NHB will also be missing.
Be parsimonious in deciding which interactions to test • As shown here, the number of variables in the regression model proliferates rapidly with each additional interaction. • Specify interactions only between key independent variables. • Communicating results becomes unwieldy: • Considerable behind-the-scenes calculations. • Extra tables or charts to convey the shape of the interaction.
Criteria for identifying pertinent interactions to test • Theoretical reasons why the association between X1and Y might differ by X2for the particular variables you are studying. • Empirical evidence that the association between X1and Y varies by X2in your data. • Three-way association among X1, X2,and Y. • See Babbie’s elaboration paradigm.
Model specification with interactions: race and education • BW = f (race, education, race_education) • Birth weight is a function of race, education, and the race-by-education interaction. • To specify the model, need ALL of the main effects and interaction term variables related to race and mother’s education • BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS)
Parsimonious specification • Most interaction specifications should initially include • Main effects terms for all variables involved in the interaction • Interaction terms • Might be able to omit some main effect or interaction terms based on • Theoretical criteria • Empirical statistical significance tests for combining groups
Summary • A model specification to test for interactions includes both main effects and interaction terms. • Combination of those terms in the model uniquely identifies each possible combination of values of the component variables. • Number and type of interaction terms needed depends on • Type (s)of variables in the interaction. • Number of categories, for categorical variables in interaction. • For most situations, test interactions among key variables only. For criteria to help you decide which interactions to test for your topic and data, see podcast on visualizing shapes of interaction patterns
Suggested resources • Miller, J. E., 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 16, on interactions • Chapter 9, on defining dummy variables • Chapter 8, on choice of reference category • Chapters 8 and 9 of Cohen et al. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge.
Suggested online resources • Podcasts on • Introduction to interactions • Visualizing shapes of interaction patterns • Choosing a reference category
Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Suggested course extensions for Chapter 16 • “Reviewing” exercises #2, 3 and 4. • “Applying statistics and writing” exercises #1, 2, and 3. • “Revising” exercises #1 and 3.
Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html