280 likes | 289 Views
Resolving the Goldilocks problem: Variables and measurement. Jane E. Miller, PhD. The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Overview. Identifying criteria for choosing fitting contrasts for each variable
E N D
Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.
Overview • Identifying criteria for choosing fitting contrasts for each variable • Understanding conceptual and contextual aspects of your variables • Becoming familiar with the distributions of your variables • Transforming variables • Describing your variables in the methods section
Criteria for choosing pertinent-sized contrasts for each of your variables • Theoretical criteria • Empirical criteria • Measurement issues
Theoretical criteria for choosing fitting contrasts • Theoretical criteria relate to how that concept is measured and compared in the literature or real-world context. • Examples: • Multiples of the poverty level that correspond with program eligibility criteria for that place and time. • Multiples of standard deviations of weight-for-height , based on international child growth standards.
Identifying theoretical criteria for your topic • Start by reading the literature to identify which ones pertain to each of your • Independent variables (IVs) • Dependent variables (DV) • Also identify real-world factors pertaining to your variables. E.g., • Physical properties (e.g. freezing point of water) • Clinically meaningful contrasts • Socially relevant contrasts
Empirical criteria for choosing fitting contrasts • Based on the observed distribution of values in your data. • Examples: • Multiples of standard deviations • Comparing values at the mean, and ±1 standard deviation in the IV • Interquartile range • Comparing values at the 25th and 75th percentiles of the IV.
When to use empirical criteria • Best used if theoretical criteria are not available for your topic. • Or possibly to compare with other studies that have used same criteria.
Measurement issues and choice of contrast size • For some variables, a one-unit contrast is too small to be measured accurately. • Examples: • Difficult for most individuals to accurately recall their annual income to the nearest dollar. • Difficult to measure blood pressure to the nearest 1 mm Hg (millimeter of mercury) • In such situations, use a larger contrast.
Understanding the context • Become familiar with the range of values that make sense for each of your variables: • When, where, and to whom the data pertain. • E.g., pertinent values for family income will be different: • Now versus 200 years ago. • In the US versus in a developing country today. • For a low-income sample of the US than for the entire population.
Understanding conceptual attributes of your measures • Become familiar with the ranges of values that make sense for each of your variables • A birth weight of 9,999 grams is too high • ~=22 lb., which is the size of an average 12 month old! • In this case, problems arose due to ignoring • System of measurement (metric, not British) • Units • Real-world meaning of the number.
Identifying the valid theoretical range of values • Different types of measures have different valid ranges: • Proportions must fall between 0.0 and 1.0. • Temperature in °Fahrenheit can be either positive or negative, but in °Kelvin can only be positive. • Number of children in a family has a narrower theoretical range than does annual family income. • Identify the pertinent limits for each of your variables.
Examining the range of observed values • Examine the distributions of the variables in your data set to become familiar with the • Units • Range • Distribution of values • Categories • Of nominal variables • Ordinal versions of continuous variables
Identifying variables for which a 1-unit contrast is not suitable • Based on your theoretical, contextual, and empirical investigations of each variable in your model, identify those for which • A one-unit contrast is too big • E.g., those with low values or a very narrow range • A one-unit contrast is too small • E.g., those with very high values or a wide range • A one-unit contrast is just right • See podcast on defining the Goldilocks problem
Defining variables to address the Goldilocks problem • Many Goldilocks issues can be addressed by modifying one or more variables before specifying the multivariate model: • Rescaling • Using a different level of aggregation • Creating a categorical version of a continuous variable.
Transforming your variables • These transformations can: • Make a one-unit increase in Xi align better with the research question. • Shift the scale of the βs to be more consistent across the set of variables in the model. • For any of these approaches, retain the original variable and create a new variable with the transformed version. • Never overwrite the original data!
Rescaling your variables • For some research questions, a simple change of scale can help make a one-unit contrast in the independent variable align better with the research question. • For example, working with • annual income in $10,000s instead of $1s. • ozone concentration in parts per thousand instead of parts per million.
Rescaling and the decimal system • Rescaling variables involves dividing or multiplying the original variable by some value • Often a multiple of ten, e.g., • Multiply by 1,000 • Divide by 100 • Although changing the scale of a variable by an order of magnitude or two is mathematically convenient, it is also arbitrary and in many cases unrelated to the topic or data under study. • E.g., increments of 10 or 100 days don’t correspond to common usage as well as increments of 7 or 30 or 365 days.
Changing the level of aggregation • An alternative way to make the scale of variables fit better with a one-unit increase is to change the level of aggregation. • If a one-unit change in the original variable is too small, shift to a lower level of aggregation, e.g., • weekly income instead of annual income; • population at the county instead of state level. • If a one-unit change is too large, shift to a higher level of aggregation, e.g., • cost per dozen instead of per piece.
Creating a categorical version of a continuous variable • For topics for which standard ranges or cutoffs are commonly used, consider creating a categorical version of a continuous variable. E.g., • Age ranges that relate to developmental, economic, social, or health phenomena • 0–17 years (children), 18–64 years, 65+ years • Clinically meaningful ranges of blood pressure • <120 mm Hg; 120–139 mm Hg; 140+ mm Hg
Describing exploratory workin your methods section • In the methods section, describe the behind-the-scenes work you did to address Goldilocks issues. • Explain the reasons for those transformations given your research question and data. • Exploratory analysis of distributions of your variables in your data set. • Background reading on commonly used cutoffs or calculations for the variables you are using.
Defining newly created variables in your methods section • If you transformed variables or created categorical versions of continuous variables, • Report units and levels of aggregation for all transformed variables. E.g., • Income in $10,000s. • Logged(income in $1s). • Specify cutoffs used to define categories. E.g., • Ranges of BMI used to define overweight or obesity. • Poverty thresholds (multiples of the Federal Poverty Level) for different years or household compositions.
Summary • Transforming one or more of your variables before specifying your multivariate model can • Make a one-unit increase in each independent variable align better with the research question. • Shift the scale of the βs to be more consistent across independent variables in the model. • In your methods section, describe • Exploratory data analysis to become familiar with observed values and distributions of each variable in your model. • The calculations and criteria used to create new variables. • Citations for those criteria and calculations.
Suggested resources • Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Chapter 10, on the Goldilocks problem • Chapter 4, on types of variables, units and distribution • Chapter 7, on choosing effective examples • Chapter 13, on the data and methods section
Suggested online resources • Podcasts on • Defining the Goldilocks problem • Resolving the Goldilocks problem using • Model specification • Effective ways of presenting results
Suggested practice problems • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Problem sets for • chapter 7, question #6 • chapter 10, questions #1 through 5.
Suggested extensions • Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. • Suggested course extensions for • chapter 4 • “Reviewing” questions #1 and 3. • chapter 10 • “Reviewing” exercises #1 and 2. • “Applying statistics and writing” question #1, 2, 3, and 5. • “Revising” questions #1, 2, 3, and 9. • chapter 13, “writing” exercises #3 and 4. • “Getting to know your variables” assignment
Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html