180 likes | 228 Views
Criteria for choosing a reference category. Jane E. Miller, PhD. Overview. What is a reference category? For independent variables (IVs) For the dependent variable (DV) Choosing reference categories based on: Theoretical criteria Previous literature on the topic Writing patterns
E N D
Criteria for choosing a reference category Jane E. Miller, PhD
Overview • What is a reference category? • For independent variables (IVs) • For the dependent variable (DV) • Choosing reference categories based on: • Theoretical criteria • Previous literature on the topic • Writing patterns • Sample size • Joint distribution of variables
What is a reference category? • For each nominal or ordinal variable, the reference category is the one against which all other categories of that variable will be compared. • A multivariate model specification will not include a dummy variable for that category. • Sometimes called the “omitted” category. • Choice of a reference category for each categorical variable in your model should NOT be arbitrary.
Multivariate coefficients and the reference category • OLS coefficients will estimate the difference in the DV for each of the other categories, compared to the reference category. • Logit models will estimate odds ratios of the outcome for each of the other categories of the IV compared to the reference category.
Choosing a reference categorybased on theoretical criteria • Your specific research question will often determine choice of reference category. E.g., • If you are analyzing effects of a drug compared to placebo, the placebo condition is the logical reference category. • If you are comparing other states to your home state, your home state should be the reference category.
Choosing a reference category based on prior literature • If previous studies of your topic have standard conventions of a reference category, often you will use it as your reference category as well. • Doing so facilitates comparison of results. • BUT, it is important to think through whether their choice fits your study. • Identify the reasons why others have chosen that reference category. • Check those reasons against your own.
Choosing a different reference category than the prior literature • If you have strong reasons to use a different reference category than a major study of your topic: • In your methods section, explain the theoretical or empirical basis why you chose a different reference category. • In the discussion section, translate your results to compare against the same reference category as other leading studies.
Choosing a reference category based on writing patterns • If your sentences tend to read “compared to group X,” then group X should be your reference category. • Doing so will ensure that your statistical calculations are consistent with how you will write about the results. • But see • Empirical criteria for sample size • Precedent in the literature
Choosing a reference categorybased on sample size • Lacking some other basis for selecting a reference category, choose the largest (modal) group. • Doing so maximizes statistical power for estimating coefficients. • Sometimes this will mesh with theoretical criteria, as when the majority racial ethnic group is chosen as the reference category. • Sometimes, your “natural” reference category includes very few cases. • Might need to pick a different group to provide stable statistical estimates.
Choosing reference categories based on joint distribution of variables • The overall reference category for a multivariate regression model is the combination of reference categories for each of your categorical variables. • Be sure that that combinationisn’t too rare. • E.g., teenagers with at least a college degree will be pretty unusual (if not definitionally impossible!), so don’t pick teenagers as the reference category for age and college+ as the reference category for education.
Reference category for dependent variables • If you are analyzing a categorical dependent variable, you also need to decide which category to model, and which category is omitted. • If the DV is dichotomous (2-category), • You will model one category. • The other will be the omitted category of the DV. • E.g., if you model having health insurance, then being uninsured is the reference category.
Reference category for a multichotomous dependent variable • If the DV is multichotomous (N-category), • You will separately model (N – 1) categories. • The other category will be the omitted category, for which no model is estimated. • E.g., if type of health insurance is a 4-category variable, • You will estimate separate models for 3 (= 4 – 1) of those categories. • For instance, you might model having public insurance, self-pay, and uninsured. • The other category (in this case private insurance) is the reference category.
Summary • Choice of a reference category for each categorical variable in your model should NOT be arbitrary. • Consider the following criteria when selecting a reference category for each of your variables: • Theoretical • Previous literature • Writing patterns • Sample size • Joint distribution of variables in your data • Use the same criteria for choosing a reference category for the DV as for IVs.
Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 8, section on choosing a reference category • Chapter 9, section on interpreting coefficients on categorical variables
Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Questions #3e and 8e in the problem set for chapter 9 • Suggested course extensions for • Chapter 8 • “Applying statistics” exercise #2 • Chapter 9 • “Reviewing” exercise #1
Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html