310 likes | 457 Views
Resolving the Goldilocks problem: Model specification. Jane E. Miller, PhD. Overview. Model specification approaches to resolving the Goldilocks problem include Standardized coefficients Logarithmic transformation Other specification issues. Standardized coefficients.
E N D
Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD
Overview • Model specification approaches to resolving the Goldilocks problem include • Standardized coefficients • Logarithmic transformation • Other specification issues
Unstandardized coefficients • Unstandardizedβs estimate the effect of a 1-unit increase in Xi on Y, where the effect size is measured in the original units of Y. • A “one-size-fits-all” approach to interpreting βs can be misleading because variables • Represent different levels of measurement, • Have different units of measurement, • Have varying distributions of values, • Occur in different real-world circumstances.
Standardized coefficients • A standardized coefficient estimates the effect of a one-standard-deviation increase in Xi on Y • Measured in standard deviation units of Y • e.g., an effect size of 0.3 would mean 30% of a standard deviation in the dependent variable • Similar to standardized scores or z-scores • Standardized βs provide a consistent metric in which to compare the relative sizes of the βs on continuous independent variables with different ranges and scales. • Contrast for each IV is its standard deviation
Using standardized coefficients • Commonly used for psychological or attitudinal scales for which the units have no inherent meaning. • Should not be used for variables for which a one-standard-deviation increase lacks an intuitive interpretation. E.g., • dummy variables • interaction terms
Specifying a model with standardized coefficients • Easily specified as an option to an OLS model in most statistical packages. • Identify the dependent and independent variables as usual. • Enter them in the model specification in their original, untransformed versions. • Do not create versions in the metric of standard deviations. The software will do that for you! • Request “standardized betas”
Descriptive statistics to report if you use standardized coefficients • In table of descriptive statistics, report the mean, minimum and maximum values and standard deviation in the original units for • each independent variable (IV) • the dependent variable (DV)
Describing standardized coefficients in prose • In the results section, interpret the effect sizes for different IVs in terms of multiples or percentages of the standard deviation in the DV • E.g., “A one-standard-deviation increase in the income-to-poverty ratio (IPR) is associated with an increase of 19.6% of a standard deviation in birth weight (about 38 grams), roughly twice the size of the corresponding standardized coefficient on mother’s age (9.7%).”
Reporting the effect size in original units “A one-standard-deviation increase in the income-to-poverty ratio (IPR) is associated with an increase of 19.6% of a standard deviation in birth weight (about 38 grams), roughly twice the size of the corresponding standardized coefficient on mother’s age (9.7%).” • Note that the effect size is also reported back in the original units of the DV (grams in this case), to facilitate intuitive understanding in the context of the specific research question and variables.
Logarithmic specifications • Another approach to comparing βs across variables with different ranges and scales is to take logarithms of the • dependent variable (Y), • independent variable(s) (Xis), • or both. • The βs on the transformed variable(s) lend themselves to straightforward interpretations such as percentage change.
Types of logarithmic specifications • Lin-lin • Lin-log • Log-lin • Log-log • Also known as “double log”
Lin-lin specifications • Review: For OLS models in which neither the IV nor the DV is logged, βmeasures the change in Y for a 1-unit increase in X1, • the changes are measured in the respective units of the IV and DV. • In the lingo of logarithmic specifications, these models are termed “lin-lin” models because they are linear in both the IV and DV Y = β0 + β1X1
Lin-log specifications • Lin-log models are of the form Y = β0 + β1 lnX1. Where lnX1 is the natural log (base e) of X1 • For such models, β1 ÷ 100 gives the change in the original units of the DV for a 1 percent increase in the IV. • E.g., in a model of earnings, βlog(hours worked) = 5,905.3: • “Each 1 percent increase in monthly hours worked is associated with a NT$ 59 increase in monthly earnings.”
Log-lin specifications • Log-lin models are of the form lnY = β0 + β1X1. • For such models, 100 (eβ – 1) gives the percentage change in Y for a 1-unit increase in X1, • Where the increase in X1 is in its original units. • E.g., “For each additional child a woman has, her monthly earnings are reduced by 3.6 percent.”
Log-log specifications • Log-log models are of the form lnY = β0 + β1lnX1 • For such models, β1 estimates the percentage change in the Y for a one percent increase in X1. • This measure is known in economics as the elasticity (Gujarati 2002). • E.g., “A 1 percent increase in monthly hours worked is associated with a 0.6% increase in monthly earnings.”
Choice of contrast size for logarithmic models • Caveat: The scale of the logged variable must be taken into account when choosing an appropriate-sized contrast. • E.g., a 1-unit increase in ln(monthly hours worked) from 5.3 to 6.3 is equivalent to an increase from 200 to 544 hours per month. • That contrast is nearly a 2.5 fold increase in hours. • Implies working three-quarters of all day and night-time hours, 7 days a week.
Review: Assess whether a 1-unit increase in the variable is the right sized contrast • Always consider whether a 1-unit increase in the variable as specified in the model makes sense in its real world context! • Topic • Distribution in the data • If not, use theoretical and empirical criteria for choosing a fitting sized contrast. • See podcast on measurement and variables approaches to resolving the Goldilocks problem
Descriptive statistics to report if you use a logarithmic specification • In a table of descriptive statistics, report the mean and range both • In the original, untransformed units, such as income in dollars, which are • more intuitively understandable • easier than the logged version to compare with values from other samples. • In the logged units, so readers know the range and scale of values to apply to the estimated coefficients.
Interpreting coefficients from logarithmic specifications • Taking logs of the IV(s) and/or DV affects interpretation of the estimated coefficients. • If your models include any logged variables, report the pertinent units as you write about the βs, especially if • your specifications include a mixture of logged and non-logged variables; • you are testing the sensitivity of your findings to different logarithmic specifications.
Polynomial: Quadratic specification of IPR/ birth weight pattern
Goldilocks issues for polynomials • In models involving polynomials such as Xi and Xi2, the effect of a 1-unit increase in Xi on Yvaries for different values of Xi. • E.g., cannot generalize the size of the effect of Xi on Y for all values of Xi. • To convey shape of the association between Xi and Y. • In the text, present change in Y for each of several contrasts in values of Xi. • Create a graph. • See podcast on polynomials for more information.
Goldilocks issues for interactions • In models involving interactions, βs on main effect and interaction terms for two or more IVs must be combined to calculate the overall effect on the DV. • Cannot examine the effect of a 1-unit change in only one of those variables based on its β alone. • See chapter and podcasts on interactions.
Summary • Certain model specifications can help reduce Goldilocks problems by imposing a consistent metric to facilitate comparison of βs across independent variables with different levels and ranges. E.g., • A 1-standard deviation increase, from standardized coefficients • A 1% increase from log-log coefficients. • Models involving non-linear functions or interactions complicate the Goldilocks issue because the effect of each variable involves several terms.
Suggested resources • Miller, J. E., 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 10 on Goldilocks problem, standardized coefficients, and polynomials • Chapter 8, on standardized scores and z-scores • Chapter 16, on interactions
More suggested resources • Miller, J. E. and Y. V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics 14 (2): 117–49. • Kachigan, Sam Kash. 1991. Multivariate Statistical Analysis: A Conceptual Introduction. 2nd Edition. New York: Radius Press. on standardized coefficients. • Gujarati, Damodar N. 2002. Basic Econometrics. 4th ed. New York: McGraw-Hill/Irwin, on logarithmic specifications.
Supplemental online resources • Podcasts on • Defining the Goldilocks problem • Resolving the Goldilocks problem • Measurement and variables • Presenting results • Calculating the shape of a polynomial • Calculating the shape of an interaction pattern • Online appendix on interpreting coefficients from logarithmic specifications.
Suggested practice exercises • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Suggested course extensions for chapter 10 • “Applying statistics and writing” question #5. • “Revising” questions #1, 2, 3, and 9.
Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html