1 / 28

Resolving the Goldilocks problem: Variables and measurement

Resolving the Goldilocks problem: Variables and measurement. Jane E. Miller, PhD. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Overview. Identifying criteria for choosing fitting contrasts for each variable

isabellan
Download Presentation

Resolving the Goldilocks problem: Variables and measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

  2. Overview • Identifying criteria for choosing fitting contrasts for each variable • Understanding conceptual and contextual aspects of your variables • Becoming familiar with the distributions of your variables • Transforming variables • Describing your variables in the methods section

  3. Criteria for choosing pertinent-sized contrasts for each of your variables • Theoretical criteria • Empirical criteria • Measurement issues

  4. Theoretical criteria for choosing fitting contrasts • Theoretical criteria relate to how that concept is measured and compared in the literature or real-world context. • Examples: • Multiples of the poverty level that correspond with program eligibility criteria for that place and time. • Multiples of standard deviations of weight-for-height , based on international child growth standards.

  5. Identifying theoretical criteria for your topic • Start by reading the literature to identify which ones pertain to each of your • Independent variables (IVs) • Dependent variables (DV) • Also identify real-world factors pertaining to your variables. E.g., • Physical properties (e.g. freezing point of water) • Clinically meaningful contrasts • Socially relevant contrasts

  6. Empirical criteria for choosing fitting contrasts • Based on the observed distribution of values in your data. • Examples: • Multiples of standard deviations • Comparing values at the mean, and ±1 standard deviation in the IV • Interquartile range • Comparing values at the 25th and 75th percentiles of the IV.

  7. When to use empirical criteria • Best used if theoretical criteria are not available for your topic. • Or possibly to compare with other studies that have used same criteria.

  8. Measurement issues and choice of contrast size • For some variables, a one-unit contrast is too small to be measured accurately. • Examples: • Difficult for most individuals to accurately recall their annual income to the nearest dollar. • Difficult to measure blood pressure to the nearest 1 mm Hg (millimeter of mercury) • In such situations, use a larger contrast.

  9. Getting to know your variables

  10. Understanding the context • Become familiar with the range of values that make sense for each of your variables: • When, where, and to whom the data pertain. • E.g., pertinent values for family income will be different: • Now versus 200 years ago. • In the US versus in a developing country today. • For a low-income sample of the US than for the entire population.

  11. Understanding conceptual attributes of your measures • Become familiar with the ranges of values that make sense for each of your variables • A birth weight of 9,999 grams is too high • ~=22 lb., which is the size of an average 12 month old! • In this case, problems arose due to ignoring • System of measurement (metric, not British) • Units • Real-world meaning of the number.

  12. Identifying the valid theoretical range of values • Different types of measures have different valid ranges: • Proportions must fall between 0.0 and 1.0. • Temperature in °Fahrenheit can be either positive or negative, but in °Kelvin can only be positive. • Number of children in a family has a narrower theoretical range than does annual family income. • Identify the pertinent limits for each of your variables.

  13. Examining the range of observed values • Examine the distributions of the variables in your data set to become familiar with the • Units • Range • Distribution of values • Categories • Of nominal variables • Ordinal versions of continuous variables

  14. Identifying variables for which a 1-unit contrast is not suitable • Based on your theoretical, contextual, and empirical investigations of each variable in your model, identify those for which • A one-unit contrast is too big • E.g., those with low values or a very narrow range • A one-unit contrast is too small • E.g., those with very high values or a wide range • A one-unit contrast is just right • See podcast on defining the Goldilocks problem

  15. Defining variables to address the Goldilocks problem • Many Goldilocks issues can be addressed by modifying one or more variables before specifying the multivariate model: • Rescaling • Using a different level of aggregation • Creating a categorical version of a continuous variable.

  16. Transforming your variables • These transformations can: • Make a one-unit increase in Xi align better with the research question. • Shift the scale of the βs to be more consistent across the set of variables in the model. • For any of these approaches, retain the original variable and create a new variable with the transformed version. • Never overwrite the original data!

  17. Rescaling your variables • For some research questions, a simple change of scale can help make a one-unit contrast in the independent variable align better with the research question. • For example, working with • annual income in $10,000s instead of $1s. • ozone concentration in parts per thousand instead of parts per million.

  18. Rescaling and the decimal system • Rescaling variables involves dividing or multiplying the original variable by some value • Often a multiple of ten, e.g., • Multiply by 1,000 • Divide by 100 • Although changing the scale of a variable by an order of magnitude or two is mathematically convenient, it is also arbitrary and in many cases unrelated to the topic or data under study. • E.g., increments of 10 or 100 days don’t correspond to common usage as well as increments of 7 or 30 or 365 days.

  19. Changing the level of aggregation • An alternative way to make the scale of variables fit better with a one-unit increase is to change the level of aggregation. • If a one-unit change in the original variable is too small, shift to a lower level of aggregation, e.g., • weekly income instead of annual income; • population at the county instead of state level. • If a one-unit change is too large, shift to a higher level of aggregation, e.g., • cost per dozen instead of per piece.

  20. Creating a categorical version of a continuous variable • For topics for which standard ranges or cutoffs are commonly used, consider creating a categorical version of a continuous variable. E.g., • Age ranges that relate to developmental, economic, social, or health phenomena • 0–17 years (children), 18–64 years, 65+ years • Clinically meaningful ranges of blood pressure • <120 mm Hg; 120–139 mm Hg; 140+ mm Hg

  21. Describing exploratory workin your methods section • In the methods section, describe the behind-the-scenes work you did to address Goldilocks issues. • Explain the reasons for those transformations given your research question and data. • Exploratory analysis of distributions of your variables in your data set. • Background reading on commonly used cutoffs or calculations for the variables you are using.

  22. Defining newly created variables in your methods section • If you transformed variables or created categorical versions of continuous variables, • Report units and levels of aggregation for all transformed variables. E.g., • Income in $10,000s. • Logged(income in $1s). • Specify cutoffs used to define categories. E.g., • Ranges of BMI used to define overweight or obesity. • Poverty thresholds (multiples of the Federal Poverty Level) for different years or household compositions.

  23. Summary • Transforming one or more of your variables before specifying your multivariate model can • Make a one-unit increase in each independent variable align better with the research question. • Shift the scale of the βs to be more consistent across independent variables in the model. • In your methods section, describe • Exploratory data analysis to become familiar with observed values and distributions of each variable in your model. • The calculations and criteria used to create new variables. • Citations for those criteria and calculations.

  24. Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 10, on the Goldilocks problem • Chapter 4, on types of variables, units and distribution • Chapter 7, on choosing effective examples • Chapter 13, on the data and methods section

  25. Suggested online resources • Podcasts on • Defining the Goldilocks problem • Resolving the Goldilocks problem using • Model specification • Effective ways of presenting results

  26. Suggested practice problems • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Problem sets for • chapter 7, question #6 • chapter 10, questions #1 through 5.

  27. Suggested extensions • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Suggested course extensions for • chapter 4 • “Reviewing” questions #1 and 3. • chapter 10 • “Reviewing” exercises #1 and 2. • “Applying statistics and writing” question #1, 2, 3, and 5. • “Revising” questions #1, 2, 3, and 9. • chapter 13, “writing” exercises #3 and 4. • “Getting to know your variables” assignment

  28. Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html

More Related