General Latent Variable Modeling Approaches to Measurement Issues using Mplus

General Latent Variable Modeling Approaches to Measurement Issues using Mplus Rich Jones jones@mail.hrca.harvard.edu Psychometrics Workshop Friday Harbor, San Juan Island, WA August 24, 2005

Overview • Part 1 • IRT overview • DIF overview • Part 2 • IRT via Factor Analysis • Factor analysis and general latent variable models for measurement issues using Mplus • Limitations of Mplus approach • Part 3 • Applied Example • Part 4 (time permitting) • Bells and Whistles • Discussion

Part 1a IRT overview

Semantics • Multiple Fields, Conflicting Language • Educational Testing, Psychological Measurement, Epidemiology & Biostatistics, Psychometrics & Structural Equation Modeling • Characteristics of People • ability, trait, state, construct, factor level, item response • Characteristics of Items • difficulty, severity, threshold, location • discrimination, sensitivity, factor loading, measurement slope

Key Ideas of IRT • Persons have a certain ability or trait • Items have characteristics • difficulty (how hard the item is) • discrimination (how well the item measures the ability) • (I won’t talk about guessing) • Person ability, and item characteristics are estimated simultaneously and expressed on unified metric • Interval-level measure of ability or trait • Used to be hard to do

Some Things You Can Do with IRT • Refine measures • Identify ‘biased’ test items • Adaptive testing • Handle missing data at the item level • Equate measures

Latent Ability / Trait • Symbolized with qi or hi • Assumed to be continuously, and often normally, distributed in the population • The more of the trait a person has, the more likely they are to ...whatever...(endorse the symptom, get the answer right etc.) • The latent trait is that unobservable, hypothetical construct presumed to be measured by the test (assumed to “cause” item responses)

Item Characteristic Curve • The fundamental conceptual unit of IRT • Relates item responses to ability presumed to cause them • Represented with cumulative logistic or cumulative normal forms

Item Response Function P(yij=1|qi) = F[aj(qi-bj)]

Example of an Item Characteristic Curve: High Ability

Example of an Item Characteristic Curve: Low Ability

Example of an Item Characteristic Curve: Item Difficulty

Example of two ICCs that Differ in Difficulty

Example of an Item Characteristic Curve: Item Discrimination

Example of two ICCs that Differ in Discrimination

Item Response Function

Extra Creditone way to get estimates of underlying ability Remember Bayes Theorem

Extra Creditone way to get estimates of underlying ability Bayes modal estimates of latent ability (h) (modal a posteriori [MAP] estimates)

Part 1b DIF Overview

Identify Biased Test ItemsDifferential Item Functioning (DIF) • Differences in likelihood of error to a given item may be due to • group differences in ability • item bias • both • IRT can parse this out • Item Bias = Differential Item Function + Rationale • Most workers in IRT identify DIF when two groups do not have the same ICC

Part 2 IRT and Factor Analysis

IRT and Factor Analysis • IRT describes a class of statistical models • IRT models can be estimated using factor analysis • Appropriate routines for ordinal dependent variables (tetrachoric/polychoric correlation coefficients) • Factor analysis models can be extended in very general ways using structural equation modeling techniques / software

www.statmodel.com • Used to be LISCOMP, owes lineage to LISREL • Does just about everything other continuous latent variable / structural equation software implement (LISREL, EQS, AMOS, CALIS) • Plus, very general latent variable modeling • Continuous latent variables (latent traits) • Categorical latent variables (latent classes, mixtures) • Missing data • Estimation with data from complex designs • Expensive, demo version available

Mplus approach to IRT Model • One or Two-parameter IRT models (not explicit) • Discrimination ≈ Factor loadings/slopes • Difficulty ≈ Item thresholds • Two estimation methods • Weighted Least Squares • Limited information • Multivariate probit (theta or delta parameterization) • Latent response variable formulation (Assume underlying continuous variables) • Maximum Likelihood • Full information • Multivariate logistic • Conditional probability formulation • More experience, fit statistics with WLS • Some model types require ML, others WLS

Latent Response Variable Formulation (picture)

Latent Response Variable Formulation (words) • Assume observed ordinal (dichotomous) y has corresponding underlying continuous normal but unobservable (latent) form (y*) • When a person’s value for y* exceeds some threshold (t), y=1 is observed, otherwise, y=0 is observed • Analysis is focused on relationship among the y* and estimating the thresholds (t)

Latent Response Variable Formulation (equation)

Conditional ProbabilityFormulation

Factor Analysis Model

Factor Analysis with Covariates

Multiple Group CFA

Multiple Group (MG) MIMIC

MIMIC and MG-MIMIC Model • Disadvantages • Not so good for factor score generation • Not exactly the IRT model • different conceptualization of NU-DIF • Some work to get a’s b’s and standard errors • Relatively little experience / literature in field • Confusing / overlapping measurement noninvariance literature from SEM field

MIMIC and MG-MIMIC Model • Advantages • Can be easy to estimate, good for modeling • No need to equate parameters • No data re-arrangements required, missing data tricks • Simultaneous analysis/evaluation of all items and possible sources of model mis-fit (including potential DIF or bias) • Multiple independent variables (with DIF) • Y’s and X’s can be categorical or continuous • Anchor items not necessary, but... • Embed in more complex models • Complimentary measurement noninvariance literature from SEM field

MIMIC Model: how to do it From within STATA using runmplus.ado runmplus y1-y4 x1, categorical(y1-y4) type(meanstructure) model(eta by y1-y4*; eta@1; eta on x1*; y1 on x1*;) Mplus syntax file • Title: MIMIC model • Data: File is __000001.dat ; • Variable: Names are y1 y2 y3 y4 x1; • categorical= y1-y4 ; • Analysis: type= meanstructure ; • MODEL: • eta by y1-y4* ; • eta@1 ; • eta on x1* ; • y1 on x1* ;

Some Applied Examples and Technical Articles • Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Meetings of Psychometric Society (1989, Los Angeles, California and Leuven, Belgium). Psychometrika, 54(4), 557-585. • McArdle, J., & Prescott, C. (1992). Age-based construct validation using structural equation modeling. Experimental Aging Research, 18(3), 87-116. • Gallo, J. J., Anthony, J. C., & Muthén, B. O. (1994). Age differences in the symptoms of depression: a latent trait analysis. Journals of Gerontology, 49(6), 251-264. • Salthouse, T., Hancock, H., Meinz, E., & Hambrick, D. (1996). Interrelations of age, visual acuity, and cognitive functioning. Journal of Gerontology: Psychological Sciences, 51B(6), P317-P330. • Grayson, D. A., Mackinnon, A., Jorm, A. F., Creasey, H., & Broe, G. A. (2000). Item bias in the Center for Epidemiologic Studies Depression Scale: effects of physical disorders and disability in an elderly community sample. The Journals of Gerontology. Series B, Psychological Sciences and Social Sciences, 55(5), 273-282. • Jones, R. N., & Gallo, J. J. (2002). Education and sex differences in the Mini Mental State Examination: Effects of differential item functioning. The Journals of Gerontology. Series B, Psychological Sciences and Social Sciences, 57B(6), P548-558. • Macintosh, R., & Hashim, S. (2003). Variance Estimation for Converting MIMIC Model Parameters to IRT Parameters in DIF Analysis. Applied Psychological Measurement, 27(5), 372-379. • Rubio, D.-M., Berg-Weger, M., Tebb, S.-S., & Rauch, S.-M. (2003). Validating a measure across groups: The use of MIMIC models in scale development. Journal of Social Service Research, 29(3), 53-68. • Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in SF-12 scores: true differences or differential item functioning? Med Care, 41(7 Suppl), III75-III86. • Jones, R. N. (2003). Racial bias in the assessment of cognitive functioning of older adults. Aging & Mental Health, 7(2), 83-102.

Part 3 An Applied Example Jones, R. N. (2003). Racial bias in the assessment of cognitive functioning of older adults. Aging & Mental Health, 7(2), 83-102. Acknowledgement: R03 AG017680

Example: Racial bias in TICS (HRS/HEAD) • Nationally representative, very large sample (N=15,257) • Over-sample of Black or African-Americans (N=2,090) • Assessment of cognition • Very adequate assessment of SES (education, income, occupation)

Objective • Evaluate the extent to which item level performance is due to test-irrelevant variance due to race (White, non-Hispanic vs. Black or African-American participants) • Control for main and potentially differential effects of background variables • Sex, Age • Educational attainment • Household income, occupation groups • Health Conditions and Health Behaviors

TICS/AHEAD Measure of Cognitive Function (Herzog 1997) Points • Orientation to time (weekday, day, month, year) 4 • Name President, Vice-President 2 • Name two objects (cactus, scissors)2 • Count Backwards from 20 1 • Serial Sevens 5 • Immediate recall (10 nouns)10 • Delayed free-recall (10 nouns, 5 min delay) 10

Sex Age (9 groups) Education (6 groups) Household Income (5 groups) ‘Highest’ household occupation (8 groups) Health Conditions (HBP, DM, heart, stroke, arthritis, pulmonary, cancer) Health Behaviors (current smoking, drinking [three groups]) Background Variables

Results • All items show DIF by race, some by sex, age, education • Effect of covariates (age, occupation, income, smoking status) significantly different across racial group • Greater variance in latent cognitive function for Black or African-American participants • No significant race difference in mean latent cognition by race after adjusting for measurement differences Jones. Aging Ment Health, 2003; 7:83-102.

Differences in Underlying Ability between Whites and African Americans • 60% is due to measurement differences (DIF, item bias) • 12% is due to main effect of background variables • 7% is due to structural differences (i.e., interactions of group and background variables) • What remains (about .2 SD) is not significantly different from no difference Jones. Aging Ment Health, 2003; 7:83-102.

General Latent Variable Modeling Approaches to Measurement Issues using Mplus