Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Class 7 Measurement Issues in Diverse Populations Including Health Disparities Research November 2, 2006 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Background • U.S. population becoming more diverse • More minority groups are being included in research due to: • NIH mandate • Recent health disparities initiatives

Types of Diverse Groups • Health disparities research focuses on differences in health between the following groups: • Minority vs. non-minority • Low income vs. others • Low education vs. others • Limited English skills vs. others • …. and others

Health Disparities Research • Increasing research to: • Describe health disparities • Differences in health across various diverse groups • Identify determinants of health disparities • Individual level • Environmental level • Intervene to reduce health disparities

Measurement Implications of Research in Diverse Groups • Most self-reported measures were developed and tested in mainstream, well-educated groups • Subgroup analysis of measures has been rare • Thus, little information is available on appropriateness, reliability, validity, and responsiveness in minority and other diverse groups

The Measurement Goal • Identify measures that can be used across all groups, and • are sensitive to diversity • have minimal bias between groups

Issues Concerning Group Comparisons • Observed mean differences in a measure can be due to • culturally- or group-mediated differences in true score (true differences) -- OR -- • bias - systematic differences between group observed scores not attributable to true scores

Bias - A Special Concern • Measurement bias in any one group may make group comparisons invalid • Bias can be due to group differences in: • the meaning of concepts or items • the extent to which measures represent a concept • cognitive processes of responding • use of response scales • appropriateness of data collection methods

Effects of Bias on Depression: Chinese and White Respondents • In Chinese respondents - 3 sources of bias that lower observed score: • tendency to not express negative feelings • exacerbated by face-to-face interview • meaning of word “depression” is more severe than for Whites – less likely to endorse it • Comparing groups – assume true level of depression is the same in both groups – • Observed scores would be lower in Chinese group • But lower level is due to these biases

Typical Sequence of Developing New Self-Report Measures Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives Pretest/revise Field survey Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups Measurementstudies across groups Psychometric analyses Final measures

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups If results are non-equivalent Psychometric analyses Final measures

Measurement Adequacy vs. Measurement Equivalence • Making group comparisons requires conceptual and psychometric adequacy and equivalence • Adequacy - within a group • concepts are appropriate • psychometric properties meet minimal criteria • Equivalence - between groups • conceptual and psychometric properties are comparable

Why Not Use Culture-Specific Measures? • Measurement goal is to identify measures that can be used across all groups, yet maintain sensitivity to diversity and have minimal bias • Most health disparities studies require comparing mean scores across diverse groups • need comparable measures

Conceptual and Psychometric Adequacy and Equivalence Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Left Side of Matrix: Issues in a Single Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Ride Side of Matrix: Issues in More Than One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Conceptual Adequacy in One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Conceptual Adequacy in One Group • Is concept relevant, meaningful, and acceptable in that group? • Traditional research • Conceptual adequacy = simply defining a concept • Mainstream population “assumed” • Minority and cross cultural research • Mainstream concepts may be inadequate • Concept should correspond to how a particular group thinks about it

Example of Inadequate Concept • Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g., • access, technical care, communication, continuity, interpersonal style • In minority and low income groups, additional relevant domains include, e.g., • discrimination by health professionals • sensitivity to language barriers

Psychometric Adequacy in One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Psychometric Adequacy in any Group • Minimal standards: • Sufficient variability • Minimal missing data • Adequate reliability/reproducibility • Evidence of construct validity • Evidence of responsiveness to change • Basic classical test theory approach

Evidence of Psychometric Inadequacy of SF-36 Scale in Three Diverse Groups • SF-36 social functioning scale - internal consistency reliability < .70 in three different samples: • Chinese language, adults aged 55-96 years • Japanese language, Japanese elders • English, Pima Indians Stewart AL & Nápoles-Springer A, 2000 (see readings)

Conceptual Equivalence Across Groups Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Conceptual Equivalence • Is the concept relevant, familiar, acceptable to all diverse groups being studied? • Is the concept defined the same way in all groups? • all relevant “domains” included (none missing) • interpreted similarly • Is the concept appropriate for all diverse groups?

Example: Subjective Test of Conceptual Equivalence of Spanish FACT-G • Bilingual/bicultural expert panel reviewed all 28 items • One item had low cultural relevance to quality of life • One concept was missing – spirituality • Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts • Sample item “I worry about dying” Cella D et al. Med Care 1998: 36;1407

Generic/Universal vs Group-Specific(Etic versus Emic) • Concepts unlikely to be defined exactly the same way across diverse ethnic groups • Generic/universal (etic) • features of a concept that are appropriate across groups • Group-Specific (emic) • idiosyncratic portions of a concept

Etic versus Emic (cont.) • Goal in health disparities research • identify generic/universal portion of a concept (could be entire concept) that can be applied across all groups • For within-group analyses or studies • the culture-specific portion is also relevant

Qualitative Approaches to Explore Conceptual Equivalence in Diverse Groups • Literature reviews • ethnographic and anthropological • In-depth interviews and focus groups • discuss concepts, obtain their views • Expert consultation from diverse groups • review concept definitions • rate relevance of items

Psychometric Equivalence Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric

Equivalence of Reliability?? No! • Difficult to compare reliability because it depends on the distribution of the construct in a sample • Thus lower reliability in one group may simply reflect poorer variability • More important is the adequacy of the reliability in both groups • Reliability meets minimal criteria within each group

Equivalence of Criterion Validity • Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. • a measure predicts utilization in both groups • a cutpoint on a screening measure has the same specificity and sensitivity in both groups

Equivalence of Construct Validity • Are hypothesized patterns of associations confirmed in both groups? • Example: Scores on the Spanish version of the FACT had similar relationships with other health measures as scores on the English version • Primarily tested through subjectively examining pattern of correlations • Can test differences using confirmatory factor analysis (e.g., through Structural Equation Modeling)

Item Equivalence • Differential Item Functioning (DIF) • Items are non-equivalent if they are differentially related to the underlying trait • Equivalence indicated by no DIF • Meaning of response categories is similar across groups • Distance between response categories is similar across groups

Methods for Identifying Differential Item Functioning (DIF) • Item Response Theory (IRT) • Examines each item in relation to underlying latent trait • Tests if responses to one item predict the underlying latent “score” similarly in two groups • if not, items have “differential item functioning”

Equivalence of Factor Structure • Factor structure is similar in new group to structure in original groups in which measure was tested • In other words, the measurement model is the same across groups • Methods • Specify the number of factors you are looking for • Determine if the hypothesized model fits the data

Exploratory Factor Analysis (EFA) • Factor analysis methods that do not constrain the number of factors or the magnitude of the loadings • Identifies an underlying structure of a set of items with no particular hypotheses • Goal - identify as few explanatory variables (i.e., factors) as possible that account for covariation among the items

Confirmatory Factor Analysis (CFA) • Methods that specify a hypothesized structure a priori (before looking at the results) • Can test mean and covariance structures • to estimate bias

Equivalence of Factor Structure: Assuring Psychometric Invariance • Psychometric invariance (technical term for psychometric equivalence) • Invariance means that important properties of a theoretically-based factor structure (measurement model) do not differ or vary across groups (are invariant) • In other words, the measurement model is the same across groups • Empirical comparison of factor structure

Criteria for Psychometric Invariance: Non-technical Language Across two or more groups, determine whether each criterion is true – a sequential process: • Same number of factors (dimensions) • Same items load on (correlate with) same factors • Each item has same factor loadings • No bias on any item or scale across groups • Same residuals on items • No item or scale bias AND same residuals

Criteria for Evaluating Invariance Across Groups: Technical Terms Dimensional Invariance: Same number of dimensions Configural Invariance: Same items load on same dimensions Metric Invariance, Factor Pattern Invariance: Items have same loadings on same dimensions Strong Factorial Invariance,Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met

Dimensional Invariance • Definition: Factor structure is the same, i.e., the same number of factors are observed in both groups • CES-D Example: • Four factors found in men and 3 factors in women (n=1000), 18-92 years of age • Failed the dimensional invariance criterion • a different number of factors was found in both groups JM Golding et al., J Clin Psychol 1991:47;61-75

Example: Dimensional Invariance of CES-D in Hispanic EPESE • Original 4 factors • Somatic symptoms • Depressive affect • Interpersonal behavior • Positive affect • Hispanic EPESE - only 2 factors • Depression (included somatic symptoms, depressive affect, and interpersonal behavior) • Well-being Miller TQ et al., The factor structure of the CES-D in two surveys ofelderly Mexican Americans, J Gerontol: Soc Sci, 1997;520:S259-69.

Configural Invariance • Assumes: dimensional invariance is found • that there were the same number of factors • Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups • CES-D Example • 4 factors found in Anglos, Blacks, and Chicanos • Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134

Metric Invariance or Factor Pattern Invariance • Assumes: dimensional and configural invariance are found • Definition: Item loadings are the same across groups, i.e., the correlation of each item with its factor is the same in both groups

Strong Factorial Invariance or Scalar Invariance • Assumes: dimensional, configural, and metric (factor pattern) invariance are found • Definition: Observed scores are unbiased, i.e., means can be compared across groups • Requires test of equivalence of mean scores across groups using confirmatory factor analysis

Anita L. Stewart Institute for Health & Aging University of California, San Francisco