430 likes | 652 Views
DIF detection using (Ordinal) Logistic Regression. Laura Gibbons, PhD Paul K. Crane, MD MPH Internal Medicine University of Washington. Outline. Brief statistical background DIFdetect package What do we do when we find DIF? New, simpler, faster solutions! Discussion.
E N D
DIF detection using (Ordinal) Logistic Regression Laura Gibbons, PhD Paul K. Crane, MD MPH Internal Medicine University of Washington
Outline • Brief statistical background • DIFdetect package • What do we do when we find DIF? • New, simpler, faster solutions! • Discussion
Statistical background • Recall definition of DIF: when a demographic characteristic interferes with relationship expected between ability level and responses to an item • A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact
The 2 Parameter Logistic model • Logit P(Y=1|a,b,θ)=Da(θ-b) • Produces an item characteristic curve • Models probability that a person correctly responds to an item given the item parameters (a,b) and their person level θ • D is a constant • a, b notation reversed from biomedical conventions
The 2 PL model • Logit P(Y=1|a,b,θ)=Da(θ-b) • b is the item difficulty • When θ=b, 50% probability of getting the item correct • a is item discrimination • a determines slope around the point where θ=b
Modest Uniform DIF Item characteristic curves for "Close your eyes" in Spanish and English speakers 1 0.5 0 -3 -2 -1 0 1 2 3
Non-Uniform DIF Item category characteristic curves for the item “ability to walk 1 block” separately in African-Americans (yellow lines) and whites 1 Probability of endorsing 0.5 0 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Physical functioning
Uniform and Non-uniform DIF Itemcharacteristic curves for "Repeating Phrase" in English and Spanish speakers 1 0.5 0 -3 -2 -1 0 1 2 3
Logistic regression applied to DIF detection • Swaminathan and Rogers (1990) • Tested two models: • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X)=f(β1X) • Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df • Uniform and non-uniform tested at same time
Camilli and Shepard (1994) • Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X, group)= f(β1X+β2*group) • P(Y=1|X)=f(β1X) • -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps
Millsap and Everson (1994) • Dismissive of “observed score” techniques such as logistic regression • X contains several items that have DIF, so adjusting for X is theoretically problematic • Advocated latent approaches such as IRT for DIF detection
Zumbo (1999) • Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items • Did not address latent trait; also used a single step rather than two steps
Crane, van Belle, Larson (2004) • Logistic regression model is a re-parameterization of the IRT model, as long as IRT-derived θ estimates are used as ability scores • Raised the issue of multiple hypothesis testing of non-uniform DIF
Crane et al. (2004) – 2 • Biggest change in terms of specific criteria for uniform DIF • Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding • Employed epidemiological thinking about how to detect confounding relationships from the data; size of effect.
Crane et al. (2004) – 3 • Same models used (though now θ not X) • P(Y=1|θ, group)= f(β1θ+β2*group) • P(Y=1|θ)=f(β1'θ) • Determine the impact of including the group term on the magnitude of the relationship between θ and item responses • Determine size of |(β1-β1')/β1|. If this is large, uniform DIF (confounding) is present
Work still pending • “Optimal” criteria for uniform and non-uniform DIF are unknown • Adjust α for multiple hypotheses? • Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term. • What proportional change in β1 is meaningful UDIF?
Also under investigation • What is the role of model fit statistics? For example, if NU DIF is present, the model with group and ability only should not fit. • How important is the proportional odds/Graded response assumption? Should stereotype or other models be used in some instances?
DIFdetect package • Can download from the web • www.alz.washington.edu/DIFDETECT/welcome.html • STATA-based user friendly package
For those who tire of clicking • Difd varlist, ABility(str) GRoups(str) [with lots of optional specifications]
Outline revisited • Brief statistical background • DIFdetect package • What do we do when we find DIF? • New, simpler, faster solutions! • Discussion
What to do when we find DIF? • Educational settings often items with DIF are discarded • Unattractive option for us • Tests are too short as it is; lose variation • Lose precision • DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups
What do we do – 2 • Need a technique to incorporate items found to have DIF differently than DIF-free items • Precedent for this approach in Reise, Widaman, and Pugh (1993) • Constrain parameters for DIF-free items to be identical across groups • Estimate parameters for items found with DIF separately in appropriate groups
Compensatory DIF • Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items • Both false-positive and false-negative DIF findings
Adjust ability for DIF • Rearrange the data to estimate a DIF-adjusted theta score in PARSCALE • Use that new theta estimate to evaluate for compensatory DIF • Repeat steps 1 and 2 until the same items are identified each time = no compensatory DIF
0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2 0133 0X1X3 0134 1X2X4 … 0932 0XX22 0933 1XX23 0934 0XX14 … Modified data set
New tools! • Difforpar itemlist, ID(id) RUnname(test0) ABility(ability0) GRoups(group) [with lots of optional specifications] Look at log for lack of convergence, dropped variables, nonsense output, and other warnings.
New tools, continued • run PARSCALE with code_test0.psl • run thetain: thetain origdata origid test0 [merges thetatest0 and sethetatest0 into original data set]
The process continues • Repeat steps 1-3 with the new thetas until the same items come up with DIF • For short lists, you can read the log file • For long lists, examine vars_testN.txt • When finished, you can check Difd.dta for model fit and assumptions
Adjusting for additional groups • mergevirtual origdata originalid [merges itemdata (containing final virtual items) into original data set] • run DIFforPar with the next group, with the new list of some original and virtual items (can copy from vars_testN.txt) and do it all again!
Other tools for Stata • PrePar Writes code and data for Parscale • Syntax: prepar namelist, ID(str) ru() • DIFforSRZ Do file for DIFdetect using SRZ 1-step criteria • Syntax: run difforsrz abil ru • Set variable list, group, criteria, in the do file.
Coming soon • DIFforPar extended for grouped variables with more than 2 categories; continous in Stata, grouped in Parscale. • Samemetric.ado (for now use: prepardata itemlist, ID(str) RUnname(str)).
Have we adjusted for DIF/controlled for confounding? • Can only adjust for measured covariates • Confounders such as education level may mean different things for different groups • Unmeasured confounders • May lack power or data may be too sparse
Adjusted cognitive ability scores • So far our adjusted scores correlate highly with non-adjusted scores. • May contain additional information. • Language DIF