660 likes | 781 Views
Class 7 Measurement Issues in Diverse Populations Including Health Disparities Research November 2, 2006. Anita L. Stewart Institute for Health & Aging University of California, San Francisco. Background . U.S. population becoming more diverse
E N D
Class 7 Measurement Issues in Diverse Populations Including Health Disparities Research November 2, 2006 Anita L. Stewart Institute for Health & Aging University of California, San Francisco
Background • U.S. population becoming more diverse • More minority groups are being included in research due to: • NIH mandate • Recent health disparities initiatives
Types of Diverse Groups • Health disparities research focuses on differences in health between the following groups: • Minority vs. non-minority • Low income vs. others • Low education vs. others • Limited English skills vs. others • …. and others
Health Disparities Research • Increasing research to: • Describe health disparities • Differences in health across various diverse groups • Identify determinants of health disparities • Individual level • Environmental level • Intervene to reduce health disparities
Measurement Implications of Research in Diverse Groups • Most self-reported measures were developed and tested in mainstream, well-educated groups • Subgroup analysis of measures has been rare • Thus, little information is available on appropriateness, reliability, validity, and responsiveness in minority and other diverse groups
The Measurement Goal • Identify measures that can be used across all groups, and • are sensitive to diversity • have minimal bias between groups
Issues Concerning Group Comparisons • Observed mean differences in a measure can be due to • culturally- or group-mediated differences in true score (true differences) -- OR -- • bias - systematic differences between group observed scores not attributable to true scores
Bias - A Special Concern • Measurement bias in any one group may make group comparisons invalid • Bias can be due to group differences in: • the meaning of concepts or items • the extent to which measures represent a concept • cognitive processes of responding • use of response scales • appropriateness of data collection methods
Effects of Bias on Depression: Chinese and White Respondents • In Chinese respondents - 3 sources of bias that lower observed score: • tendency to not express negative feelings • exacerbated by face-to-face interview • meaning of word “depression” is more severe than for Whites – less likely to endorse it • Comparing groups – assume true level of depression is the same in both groups – • Observed scores would be lower in Chinese group • But lower level is due to these biases
Typical Sequence of Developing New Self-Report Measures Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives Pretest/revise Field survey Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups Measurementstudies across groups Psychometric analyses Final measures
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Obtain perspectives of diverse groups Develop concept Create item pool .. to reflect these perspectives .. in all diverse groups Pretest/revise Field survey .. in all diverse groups If results are non-equivalent Psychometric analyses Final measures
Measurement Adequacy vs. Measurement Equivalence • Making group comparisons requires conceptual and psychometric adequacy and equivalence • Adequacy - within a group • concepts are appropriate • psychometric properties meet minimal criteria • Equivalence - between groups • conceptual and psychometric properties are comparable
Why Not Use Culture-Specific Measures? • Measurement goal is to identify measures that can be used across all groups, yet maintain sensitivity to diversity and have minimal bias • Most health disparities studies require comparing mean scores across diverse groups • need comparable measures
Conceptual and Psychometric Adequacy and Equivalence Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Left Side of Matrix: Issues in a Single Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Ride Side of Matrix: Issues in More Than One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Conceptual Adequacy in One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Conceptual Adequacy in One Group • Is concept relevant, meaningful, and acceptable in that group? • Traditional research • Conceptual adequacy = simply defining a concept • Mainstream population “assumed” • Minority and cross cultural research • Mainstream concepts may be inadequate • Concept should correspond to how a particular group thinks about it
Example of Inadequate Concept • Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g., • access, technical care, communication, continuity, interpersonal style • In minority and low income groups, additional relevant domains include, e.g., • discrimination by health professionals • sensitivity to language barriers
Psychometric Adequacy in One Group Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Psychometric Adequacy in any Group • Minimal standards: • Sufficient variability • Minimal missing data • Adequate reliability/reproducibility • Evidence of construct validity • Evidence of responsiveness to change • Basic classical test theory approach
Evidence of Psychometric Inadequacy of SF-36 Scale in Three Diverse Groups • SF-36 social functioning scale - internal consistency reliability < .70 in three different samples: • Chinese language, adults aged 55-96 years • Japanese language, Japanese elders • English, Pima Indians Stewart AL & Nápoles-Springer A, 2000 (see readings)
Conceptual Equivalence Across Groups Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Conceptual Equivalence • Is the concept relevant, familiar, acceptable to all diverse groups being studied? • Is the concept defined the same way in all groups? • all relevant “domains” included (none missing) • interpreted similarly • Is the concept appropriate for all diverse groups?
Example: Subjective Test of Conceptual Equivalence of Spanish FACT-G • Bilingual/bicultural expert panel reviewed all 28 items • One item had low cultural relevance to quality of life • One concept was missing – spirituality • Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts • Sample item “I worry about dying” Cella D et al. Med Care 1998: 36;1407
Generic/Universal vs Group-Specific(Etic versus Emic) • Concepts unlikely to be defined exactly the same way across diverse ethnic groups • Generic/universal (etic) • features of a concept that are appropriate across groups • Group-Specific (emic) • idiosyncratic portions of a concept
Etic versus Emic (cont.) • Goal in health disparities research • identify generic/universal portion of a concept (could be entire concept) that can be applied across all groups • For within-group analyses or studies • the culture-specific portion is also relevant
Qualitative Approaches to Explore Conceptual Equivalence in Diverse Groups • Literature reviews • ethnographic and anthropological • In-depth interviews and focus groups • discuss concepts, obtain their views • Expert consultation from diverse groups • review concept definitions • rate relevance of items
Psychometric Equivalence Conceptual Concept meaningful within one group Concept equivalent across groups Adequacyin 1 Group Equivalence Across Groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Psychometric
Equivalence of Reliability?? No! • Difficult to compare reliability because it depends on the distribution of the construct in a sample • Thus lower reliability in one group may simply reflect poorer variability • More important is the adequacy of the reliability in both groups • Reliability meets minimal criteria within each group
Equivalence of Criterion Validity • Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. • a measure predicts utilization in both groups • a cutpoint on a screening measure has the same specificity and sensitivity in both groups
Equivalence of Construct Validity • Are hypothesized patterns of associations confirmed in both groups? • Example: Scores on the Spanish version of the FACT had similar relationships with other health measures as scores on the English version • Primarily tested through subjectively examining pattern of correlations • Can test differences using confirmatory factor analysis (e.g., through Structural Equation Modeling)
Item Equivalence • Differential Item Functioning (DIF) • Items are non-equivalent if they are differentially related to the underlying trait • Equivalence indicated by no DIF • Meaning of response categories is similar across groups • Distance between response categories is similar across groups
Methods for Identifying Differential Item Functioning (DIF) • Item Response Theory (IRT) • Examines each item in relation to underlying latent trait • Tests if responses to one item predict the underlying latent “score” similarly in two groups • if not, items have “differential item functioning”
Equivalence of Factor Structure • Factor structure is similar in new group to structure in original groups in which measure was tested • In other words, the measurement model is the same across groups • Methods • Specify the number of factors you are looking for • Determine if the hypothesized model fits the data
Exploratory Factor Analysis (EFA) • Factor analysis methods that do not constrain the number of factors or the magnitude of the loadings • Identifies an underlying structure of a set of items with no particular hypotheses • Goal - identify as few explanatory variables (i.e., factors) as possible that account for covariation among the items
Confirmatory Factor Analysis (CFA) • Methods that specify a hypothesized structure a priori (before looking at the results) • Can test mean and covariance structures • to estimate bias
Equivalence of Factor Structure: Assuring Psychometric Invariance • Psychometric invariance (technical term for psychometric equivalence) • Invariance means that important properties of a theoretically-based factor structure (measurement model) do not differ or vary across groups (are invariant) • In other words, the measurement model is the same across groups • Empirical comparison of factor structure
Criteria for Psychometric Invariance: Non-technical Language Across two or more groups, determine whether each criterion is true – a sequential process: • Same number of factors (dimensions) • Same items load on (correlate with) same factors • Each item has same factor loadings • No bias on any item or scale across groups • Same residuals on items • No item or scale bias AND same residuals
Criteria for Evaluating Invariance Across Groups: Technical Terms Dimensional Invariance: Same number of dimensions Configural Invariance: Same items load on same dimensions Metric Invariance, Factor Pattern Invariance: Items have same loadings on same dimensions Strong Factorial Invariance,Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met
Dimensional Invariance • Definition: Factor structure is the same, i.e., the same number of factors are observed in both groups • CES-D Example: • Four factors found in men and 3 factors in women (n=1000), 18-92 years of age • Failed the dimensional invariance criterion • a different number of factors was found in both groups JM Golding et al., J Clin Psychol 1991:47;61-75
Example: Dimensional Invariance of CES-D in Hispanic EPESE • Original 4 factors • Somatic symptoms • Depressive affect • Interpersonal behavior • Positive affect • Hispanic EPESE - only 2 factors • Depression (included somatic symptoms, depressive affect, and interpersonal behavior) • Well-being Miller TQ et al., The factor structure of the CES-D in two surveys ofelderly Mexican Americans, J Gerontol: Soc Sci, 1997;520:S259-69.
Configural Invariance • Assumes: dimensional invariance is found • that there were the same number of factors • Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups • CES-D Example • 4 factors found in Anglos, Blacks, and Chicanos • Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134
Metric Invariance or Factor Pattern Invariance • Assumes: dimensional and configural invariance are found • Definition: Item loadings are the same across groups, i.e., the correlation of each item with its factor is the same in both groups
Strong Factorial Invariance or Scalar Invariance • Assumes: dimensional, configural, and metric (factor pattern) invariance are found • Definition: Observed scores are unbiased, i.e., means can be compared across groups • Requires test of equivalence of mean scores across groups using confirmatory factor analysis