320 likes | 447 Views
Resolving the Goldilocks problem: Presenting results. Jane E. Miller, PhD. Overview. Labeling of variables Identifying model specifications Contents of descriptive statistics tables Prose interpretation of multivariate coefficients In the results section In the discussion section.
E N D
Resolving the Goldilocks problem: Presenting results Jane E. Miller, PhD
Overview • Labeling of variables • Identifying model specifications • Contents of descriptive statistics tables • Prose interpretation of multivariate coefficients • In the results section • In the discussion section
Guidelines for effective labeling:Conveying levels of measurement, units and categories
Basic attributes of variables • For every variable in your analysis, convey • Units of measurement • System of measurement (e.g., British or metric?) • Scale (e.g., millimeters or meters?) • Level of aggregation (e.g., weekly or annual income?) • Names of categories, if nominal or ordinal • Label them accordingly in • Text • Tables of univariate, bivariate, and multivariate statistics • Charts
Common errors in labeling units • Calling a proportion a percentage (or vice versa) • Check axis scales that are automatically generated by Excel or other graphing programs. • If your data are stored as proportions, that is how they will be graphed. • When you add labels, make them match the actual units. • Forgetting to specify the level of aggregation • E.g., deaths per 1,000 persons is not the same as deaths per 100,000.
Tables of multivariate statistics • To provide the information needed to correctly interpret βs for each variable in the multivariate model, carefully label • level of aggregation • units • categories (ranges) • Labels and titles should reflect whatever version of the variable is included in the specification. • Transformed or original
Units for transformed variables • Label the units in multivariate tables, charts, and prose as they appear in the model. E.g., • If you have transformed 1+ continuous independent or dependent variables before specifying the model. E.g., • Changed the scale • Taken logs • Changed the level of aggregation • Created a categorical version of a continuous variable
Conveying model specifications in multivariate tables and charts • Clearly name types of model specification in tables and charts. • Mention type of model in the title. • E.g., “Standardized coefficients from a model of . . .” • Or label columns or axes to identify: • Standardized coefficients • Logged dependent variables • Identify logged independent variables in their respective row labels.
Background information for contrasts • Anticipate the metrics and contrasts you will use as you write about your βs. E.g., • Interquartile range • Multiples of standard deviations • Percentiles in a reference distribution • Report the corresponding statistics in tables about your own data. • Cite a source for complex reference data. • E.g., Federal Poverty Levels for different sizes and age compositions of household.
Tables of descriptive statistics • To provide readers with the basis of the numeric contrasts in the associated prose: • If your models include logged versions of variable(s), report descriptive statistics in both • The original, untransformed units • The logged units • If you use empirically based contrasts to interpret βs, include those values in the table of descriptive statistics. E.g., • Interquartile range • Standard deviation and mean
Presenting results to minimize Goldilocks problems • Explicitly identify the nature and size of the contrast for each independent variable as you interpret its estimated coefficient. • Continuous or categorical? • Units or categories being compared? • Size of numeric contrast applied to each β? • Units of the effect measure?
Common pitfalls in reporting of multivariate coefficients • Simply reporting βs increases the chances of Goldilocks mistakes by failing to remind readers to consider • variable types • units • range and scale • categories • The βs should be reported in your multivariate table. • Don’t simply repeat (report) those values without substantive interpretation.
Prose to avert Goldilocks errors in interpretation of coefficients • Apply carefully selected “right-sized” contrasts to each β. • Where needed, explain the criteria used to identify fitting numeric contrasts for each of your key variables. E.g., • Cite sources of substantively relevant contrasts/ • Identify empirical contrasts by name. E.g., • Interquartile range • Standard deviation difference • Convey the units or categories of both your IV and DV as you interpret the direction and magnitude of the βs.
Reporting coefficients on nominal variables • Name the categories being compared. • Such wording helps avoid implying that • More than a 1-unit change in a dummy variable is possible; • Directionality of movement across categories pertains to nominal variables. • This distinction is especially important when interpreting βs for both continuous and categorical independent variables.
Interpreting β for a nominal variable • Poor: “The β for boy was 116.0.” • If reported in a series of coefficients, this version invites readers to compare them directly, without factoring in that only a “1-unit” contrast is possible for gender. • Poor, version number2: “Gender is positively associated with birth weight.” • Cannot specify direction of association for a nominal variable. • Best: “At birth, boys weigh on average 116 grams more than girls (p < 0.01).” • Concepts, categories, units, direction, size, and statistical significance.
Reporting coefficients on ordinal variables • For ordinal variables, including composite scales or indexes that are composed from ordinal variables such as Likert scales. • Name the categories and specify the reference category. • Can write about directionality of the association because the categories are ordered. • However, distance between ordinal categories cannot be treated as equal. • Numeric codes don’t have mathematical meaning.
Interpreting β for an ordinal variable • Poor: “Self-rated health and mortality are correlated.” • Fails to convey direction or magnitude. • Better: “Among middle-aged men, self-rated health was inversely related to mortality.” • Conveys directionality. • Doesn’t name categories being compared or size of mortality difference between them.
Interpreting β for an ordinal variable • Best: “Among middle-aged men, self-rated health was inversely related to mortality. Relative risks of mortality were 2.8, 2.2, and 1.9 for those who rated their health ‘poor/fair’, ‘good’, and ‘very good’, when each was compared with ‘excellent’ health (all p < 0.05).” • Concepts, direction, magnitude, and statistical significance. • By naming the categories, conveys what a move up the self-rated scale from one category to the next represents conceptually.
Interpreting coefficients on continuous variables: Units • Mention the units of your independent and dependent variables as used in the regression. • Units • Original units or logged? • Unstandardized βs or standardized βs? • Level of aggregation, e.g., • Weekly or monthly income? • Income in $1s or $1,000s? • Individual or family or household income? • Might differ from their original form in the data.
Interpreting β for a continuous variable • Poor: “The β of income on mortality was 0.02.” • By not mentioning the units of either income or mortality: • Leaves the size of the β open to misinterpretation. • Makes it difficult to compare against βs on the same topic by other authors. • Better: “Each additional $10,000 in annual family income was associated with a 2% decrease in the age-standardized mortality rate.”
Interpreting coefficients using contrasts other than a 1-unit increase • If you apply a contrast other than a 1-unit increase, report the size of that contrast as you interpret the pertinent β. E.g., • “Each five-year increase in mother’s age is associated with a 53 gram increase in birth weight.” • “Students whose SAT scores were one standard deviation above the mean had 22% higher chances of graduating from college within six year as those with SAT scores at the population mean.” • “Adults at the 25th percentile of BMI were only one-third as likely to die as those at the 75th percentile.”
Goldilocks guidelines for the discussion section • Put your results in context in terms of both • Topic • Data • Context • Specific measures of the concepts under study • Reiterate theoretical criteria that identify meaningful contrasts for your topic. E.g., • Clinical cutoffs • Program eligibility thresholds
Statistical significance versus substantive importance • Summarize what those contrasts show about your key findings, differentiating between • statistical significance • substantive importance • Could mean that once the metrics of the variables were considered, one or more independent variables did not have a substantively meaningful association with the dependent variable, • even if that association was • statistically significant • in the expected direction.
Differentiating between statistical and substantive significance • Use a modifier such as “statistically” or “substantively” before the term “significant” so readers know which type of “significant” you mean. • Doing so might also remind you to discuss BOTH of those aspects of your findings. • Rather than vague reference to “substantive significance,” explain that aspect with reference to topic-specific criteria. • See chapter and podcast on substantive significance.
Summary • To present βs from multivariate models effectively, need to convey • Units, categories, and descriptive statistics on all variables as they are specified in the model • Type of model specification • The size of the contrast applied to each β as it is • Interpreted • Compared with other coefficients in the model or in other papers. • Close the narrative by putting the size of major findings back in substantive context.
Suggested resources • Miller, J. E., 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. (“WAMA II”) • Chapter 3, on statistical significance, substantive significance, and causality • Chapter 9, on quantitative comparisons for multivariate models • Chapter 10, on the Goldilocks problem • Miller, J. E. and Y. V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics 14 (2): 117–49.
Suggested online resources • Podcasts on • Statistical significance, substantive significance, and causality • Interpreting coefficients from multivariate models • Defining the Goldilocks problem • Resolving the Goldilocks problem • Measurement and variables • Model specification
Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Problem sets for • chapter 3, question #4. • chapter 10, questions #5 through 8.
Suggested extensions • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Suggested course extensions for • chapter 3 • “Reviewing” exercise 3.a.v and 3.b. • “Writing and revising” exercises #2 and 3. • chapter 10 • “Reviewing” exercises #1 and 2. • “Applying statistics and writing” question #2. • “Revising” questions #1 through 5, and 9.
Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html