DIF detection using OLR

DIF detection using OLR Paul K. Crane, MD MPH Internal Medicine University of Washington

Outline • Statistical background • DIFdetect package • What do we do when we find DIF? • DIF adjustments to PARSCALE code • How good are adjusted scores? • Discussion

Statistical background • Recall definition of DIF: when demographic characteristic(s) interfere with relationship expected between ability level and responses to an item • A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact

Logistic regression applied to DIF detection • Swaminathan and Rogers (1990) • Tested two models: • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X)=f(β1X) • Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df • Uniform and non-uniform tested at same time

Camilli and Shepard (1994) • Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X, group)= f(β1X+β2*group) • P(Y=1|X)=f(β1X) • -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps

Millsap and Everson (1994) • Dismissive of “observed score” techniques such as logistic regression • X contains several items that have DIF, so adjusting for X is theoretically problematic • Advocated latent approaches such as IRT for DIF detection • Very influential publication

Zumbo (1999) • Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items • Did not address latent trait; also used a single step rather than two steps

Crane, van Belle, Larson (2004) • Pointed out that logistic regression model is a re-parameterization of the IRT model as long as IRT-derived θ estimates are used as ability scores • Addressed multiple hypothesis testing of non-uniform DIF; no difference between four different techniques of adjusting

Crane et al. (2004) – 2 • Biggest change in terms of specific criteria for uniform DIF • Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding • Employed epidemiological thinking about how to detect confounding relationships from the data

Crane et al. (2004) – 3 • Same models used (though now θ not X) • P(Y=1|θ, group)= f(β1θ+β2*group) • P(Y=1|θ)=f(β1’θ) • Determine the impact of including the group term on the magnitude of the relationship between θ and item responses • Determine size of |(β1-β1’)/β1|. If this is large, uniform DIF (confounding) is present • Maldonado and Greenland simulation study on confounder selection strategies

Work still pending • “Optimal” criteria for uniform and non-uniform DIF are unknown • Adjust α for multiple hypotheses? How many multiple hypotheses? • Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term • What proportional change in β1 is significant UDIF?

DIFdetect package • Can download from the web • www.alz.washington.edu/DIFDETECT/welcome.html • STATA-based user friendly package

Outline revisited • Statistical background • DIFdetect package • What do we do when we find DIF? • DIF adjustments to PARSCALE code • How good are adjusted scores? • Discussion

What to do when we find DIF? • Educational settings often items with DIF are discarded • Unattractive option for us • Tests are too short as it is; lose variation • Lose precision • DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups

What do we do – 2 • Need a technique to incorporate items found to have DIF differently than DIF-free items • Precedent for this approach in Reise, Widaman, and Pugh (1993) • Constrain parameters for DIF-free items to be identical across groups • Estimate parameters for items found with DIF separately in appropriate groups

Compensatory DIF • Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items • Both false-positive and false-negative DIF findings • Iterative process for each covariate until stable solution is reached (i.e., same items identified with DIF on separate runs of DIFdetect)

Adjustments to PARSCALE • Create a new dataset that treat items according to their DIF status

0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2 0133 0X1X3 0134 1X2X4 … 0932 0XX22 0933 1XX23 0934 0XX14 … Modified data set

PARSCALE code • Need new lines (new blocks) for all new items that we create • We are automating this step as an extension to DIFdetect • Current best advice is to use a huge table in Word • Creation of new items is easy; we have STATA code for creation of virtual items

Preparation of data for PARSCALE

Reminder of PARSCALE tips • When outfiling from STATA, use wide format • Use commas • Change missing values to .x • Open the file in Word and replace “.x” with X • Remember to change 2-digit numbers to their appropriate letters

It gets complicated… • This is the CASI, first run of education DIF, after looking at gender and age :

Table helps with PARSCALE code

Adjusted scores related to dementia and CIND • In the ACT study, controlling for CASI score (continuous): odds ratio of 2.9 (1.8-4.9) for low DIF-adjusted IRT score (among those with low CASI scores) • Adjusted for gender, education, and age • Strict 2-stage sample design  verification bias • In the CSHA, controlling for 3MS score (continuous): weighted odds ratio of 1.6 (1.1-2.3) for dementia for low DIF-adjusted IRT score, and 1.4 (1.2-1.8) for CIND • Adjusted for education and language • Sampling and weighting to deal with verification bias

Incorporation of adjusted scores into analyses • Here we are in novel territory • Is there a reason not to adjust scores for DIF? • Questions and comments

Comparison of OLR with other techniques • OLR is more flexible (can look at continuous constructs, e.g., education, without dichotomizing or grouping) • DIFdetect is very fast • When using IRT-derived θ scores, a re-parameterization of IRT analyses • DIFdetect OLR incorporates epidemiology concepts of confounding and effect modification • Teresi (ed) special issue of Medical Care to come out

DIF detection using OLR

DIF detection using OLR

Presentation Transcript

DIF Analysis

DIF Metadata

SPCC Online Reconciliation(OLR )

DIF

DIF detection using (Ordinal) Logistic Regression

Indo-Pacific OLR variations

What’s the dif?

OLR PROCESS OVERVIEW

OLR Open Learning Resources

What’s the dif?

Detection of Differential Item/Test Functioning (DIF/DTF) Using IRT

OLR Scholarships 2013 – 14

Functional DIF

DIF detection using (Ordinal) Logistic Regression

OLR standard deviation