Interpreting multivariate OLS and logit coefficients

Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD

Overview • What elements to report for coefficients • Coefficients on • Continuous independent variables (IVs; predictors) • Categorical independent variables • Ordinary least squares (OLS) and logit coefficients • Topic sentences for paragraphs reporting multivariate coefficients

Report and interpret results • Report detailed multivariate results in tables. • Coefficients. • Inferential statistical results: • standard error or test statistic, • p-value or symbol. • Model goodness of fit statistics. • Interpret coefficients in the text. • Refer to associated table.

What to report for coefficients • Topic • Independent variable (IV) • Dependent variable (DV) • Direction (AKA “sign”) • Magnitude (AKA “size”) • Units or categories • Statistical significance • Most authors remember to report statistical significance, so I have listed that element last!

Interpreting coefficients • Poor: “The effect of public insurance was –7.2 (p < 0.05).” • Reports the coefficient without interpreting it. Without units or reference group, the meaning of “– 7.2” cannot be interpreted. • Better: “Children with private insurancestayed on average 7.2 dayslonger than those with public insurance(p < 0.05).” • Interprets the β in intuitive terms, mentioning the topic, units, categories, direction, magnitude and statistical significance.

More examples of interpreting βs • Poor: “Insurance and length of stay were associated (p < 0.05).” • Topic and statistical significance, but not direction or size. • Better: “Privately-insured children stayedlonger than publicly insured children (p < 0.05).” • Statistical significance and direction, but not size. • Best: “Children with private insurancestayed on average 7.2 dayslonger than those with public insurance(p < 0.05).” • Topic, direction, magnitude, units, categories, and statistical significance.

Interpretation of βs depends on types of variables in your models • The type of dependent variable: • Continuous dependent variable • Ordinary least squares (OLS) • Categorical dependent variable • Logistic (logit) regression model • Type of independent variable: • Continuous • Categorical

Interpreting coefficients from ordinary least squares (OLS) models

Coefficients for OLS models • For ordinary least squares (OLS) models, the coefficient (β) is a measure of difference in the DV for a 1-unit increase in the IV. • For unstandardized coefficients, difference in the same units as the dependent variable. • Can be explained using wording for results of subtraction. • For standardized coefficients, β measures difference in standardized units (multiples of standard deviations). • See podcast about resolving the Goldilocks problem using model specification.

Interpreting βs: continuous predictors • The unstandardized coefficient on a continuous predictor in an OLS model measures • The difference in the dependent variable for a one-unit increase in the independent variable. • Effect size is in original units of the DV. • Example topic: Mother’s age as a predictor of birth weight: • Dependent variable = birth weight in grams. • Independent variable = mother’s age in years. • Both are continuous variables.

Example: Mother’s age as a predictor of birth weight • Poor: “Mother’s age andchild’s birth weight are correlated (p<0.01).” • Names the dependent and independent variables and conveys statistical significance, but not direction or magnitude of the association. • Better: “As mother’s age increases, her child’s birth weight also increases(p<0.01).” • Concepts, direction, and statistical significance, but not size. • Best: “For each additional year of mother’s age at the time of her child’s birth, the child’s birth weight increases by 10.7grams(p<0.01).” • Concepts, units, direction, size, and statistical significance.

Interpreting βs: categorical predictors • The β on a categorical IV in an OLS model measures the difference in the DV for the category of interest compared to the reference category. • A “1-unit increase” does NOT make sense. • Example: gender • Dummy variable (AKA “binary variable”) coded • 1 = boy • 0 = girl = omitted (reference) category

Example: Gender as a predictor of birth weight • Poor: “The β for ‘BBBOY’ is 116.1 with an s.e. of 12.3 (table 15.3).” • Uses a cryptic acronym rather than naming the independent variable or conveying that it is categorical. • Doesn’t convey the dependent variable. • Reports the same information as the table (size of coefficient and standard error), but does not interpret them. • The direction of the effect cannot be determined because categories and units are not specified. • To assess statistical significance, readers must calculate test statistic and compare it against critical value.

Gender as a predictor of birth weight, cont. • Slightly better: “Gender is associated with a difference of 116.1grams in birth weight (p < 0.01).” • Concepts, magnitude, units, and statistical significance but not direction: Was birth weight higher for boys or for girls? • Best: “At birth, boys weigh on average 116gramsmorethan girls (p < 0.01).” • Concepts, reference category and units, direction, magnitude, and statistical significance.

Identifying the reference category • For categorical variables, mention identity of reference category. • E.g., effect size is relative to whom? • Example for 2-category comparison: • “Boys weighed 116 grams more than girls.” • Example for multicategory comparison: • “Compared to white infants, black and Hispanic infants weighed 62 and 16 grams less on average.” • OR “Mean birth weight was 62 and 16 grams less, for black and Hispanic infants, respectively, when each is compared to white infants.”

Interpreting coefficients from logistic regression models

Logit models for categorical dependent variables • Logit = log[p/(1 – p)] = log(odds of the category you are modeling) • p is the proportion of the sample in the modeled category • βmeasures the log relative-odds of the outcome for different values of the independent variable • Exponentiate the logit coefficient eβ= relative odds, or “odds ratio” • Compares the odds of the outcome for different values of the independent variable

Example: Logit model of LBW • Low birth weight (LBW) = birth weight <2,500 grams • Log-odds = log[pLBW/(1 – pLBW )] • Where pLBW is the proportion of the sample that is LBW. • Log relative odds of LBW = comparison of log-odds of LBW for different values of the independent variable. • eβ= relative odds of LBW for different values of the independent variable.

Wording for odds ratios • βs for logit models are in the form of ratios. • For suggestions on how to phrase descriptions of ratios with minimal jargon, see • Table 5.3 in The Chicago Guide to Writing about Numbers OR • Table 8.3 in The Chicago Guide to Writing about Multivariate Analysis

Phrases for ratios See tables in WA#s or WAMA

Odds ratios for categorical independent variables • Odds ratio of the outcome for the category of interest compared to the reference category. • “Infants born to smokers had 1.4times the odds of low birth weight (LBW) as those born to nonsmokers (p < 0.01).” • Concepts,reference category,direction,magnitude, and statistical significance.

Odds ratios for continuous independent variables • Odds ratio of the outcome for a one-unit increase in the independent variable. • “Odds of LBWdecreased by about 0.8%for each 1 year increase in mother’s age (NS).” • Concepts,units,direction,magnitude, and statistical significance.

Topic sentences for paragraphs reporting multivariate results • Start each paragraph of the results section with a restatement of topic addressed by analysis to be reported in that paragraph. • Can paraphrase title of table or chart that reports the detailed statistical results. • Topic sentence should mention: • Dependent variable. • Independent variable(s). • Use summary phrase rather than long list of variables. • Type of analysis.

Example topic sentences • “Multivariate logistic regressionresults show thatinsuranceis a powerful predictor oflength of stay (table X).” [Next sentence goes into detail about direction, size, and statistical significance.] • Mentionstype of analysis, dependent variable, andindependent variable. • “As shown in figure Y,race and income levelinteract in their effect on risk of asthma.”

Summary • Report detailed multivariate results in tables. • Interpret coefficients in prose. • Specify direction, magnitude, and statistical significance of associations. • Units for continuous variables • Categories for nominal or ordinal variables • Write about concepts, not acronyms. • Introduce concepts under study in topic sentences.

Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 5, on creating effective multivariate tables • Chapter 8, on wording for results of • subtraction (OLS βs) • ratios (logit βs) • Chapter 9, on writing about βs from OLS and logit models • Chapter 10, on the Goldilocks problem for choosing a fitting contrast size for interpreting coefficients

Suggested online resources • Podcasts on • Comparing two numbers or series • Choosing a reference category • Defining the Goldilocks problem • Resolving the Goldilocks problem: Presenting results • Differentiating between statistical significance and substantive importance

Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Problem sets for chapters 9 and 15 • Suggested course extensions for chapters 9 and 15 • “Reviewing,” “writing” and “revising” exercises.

Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html

Interpreting multivariate OLS and logit coefficients