1 / 40

DIF detection using (Ordinal) Logistic Regression

DIF detection using (Ordinal) Logistic Regression. Laura Gibbons, PhD Paul K. Crane, MD MPH Internal Medicine University of Washington. Outline. Brief statistical background DIFdetect package What do we do when we find DIF? New, simpler, faster solutions! Discussion.

rico
Download Presentation

DIF detection using (Ordinal) Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIF detection using (Ordinal) Logistic Regression Laura Gibbons, PhD Paul K. Crane, MD MPH Internal Medicine University of Washington

  2. Outline • Brief statistical background • DIFdetect package • What do we do when we find DIF? • New, simpler, faster solutions! • Discussion

  3. Statistical background • Recall definition of DIF: when a demographic characteristic interferes with relationship expected between ability level and responses to an item • A conditional definition; have to control for ability level, or else we can’t differentiate between DIF and differential test impact

  4. The 2 Parameter Logistic model • Logit P(Y=1|a,b,θ)=Da(θ-b) • Produces an item characteristic curve • Models probability that a person correctly responds to an item given the item parameters (a,b) and their person level θ • D is a constant • a, b notation reversed from biomedical conventions

  5. The 2 PL model • Logit P(Y=1|a,b,θ)=Da(θ-b) • b is the item difficulty • When θ=b, 50% probability of getting the item correct • a is item discrimination • a determines slope around the point where θ=b

  6. Modest Uniform DIF Item characteristic curves for "Close your eyes" in Spanish and English speakers 1 0.5 0 -3 -2 -1 0 1 2 3

  7. Non-Uniform DIF Item category characteristic curves for the item “ability to walk 1 block” separately in African-Americans (yellow lines) and whites 1 Probability of endorsing 0.5 0 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Physical functioning

  8. Uniform and Non-uniform DIF Itemcharacteristic curves for "Repeating Phrase" in English and Spanish speakers 1 0.5 0 -3 -2 -1 0 1 2 3

  9. Logistic regression applied to DIF detection • Swaminathan and Rogers (1990) • Tested two models: • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X)=f(β1X) • Compared the –2 log likelihoods of these two models to a chi squared distribution with 2 df • Uniform and non-uniform tested at same time

  10. Camilli and Shepard (1994) • Recommended a two step procedure, to first test for non-uniform DIF and then for uniform DIF • P(Y=1|X, group)=f(β1X+β2*group+β3*X*group) • P(Y=1|X, group)= f(β1X+β2*group) • P(Y=1|X)=f(β1X) • -2 log likelihoods of each pair of models compared to determine non-uniform DIF and uniform DIF in two separate steps

  11. Millsap and Everson (1994) • Dismissive of “observed score” techniques such as logistic regression • X contains several items that have DIF, so adjusting for X is theoretically problematic • Advocated latent approaches such as IRT for DIF detection

  12. Zumbo (1999) • Extended Swaminathan and Rogers framework to ordinal logistic regression case to handle polytomous items • Did not address latent trait; also used a single step rather than two steps

  13. Crane, van Belle, Larson (2004) • Logistic regression model is a re-parameterization of the IRT model, as long as IRT-derived θ estimates are used as ability scores • Raised the issue of multiple hypothesis testing of non-uniform DIF

  14. Crane et al. (2004) – 2 • Biggest change in terms of specific criteria for uniform DIF • Recognized that non-uniform and uniform DIF were analogous to effect modification and confounding • Employed epidemiological thinking about how to detect confounding relationships from the data; size of effect.

  15. Crane et al. (2004) – 3 • Same models used (though now θ not X) • P(Y=1|θ, group)= f(β1θ+β2*group) • P(Y=1|θ)=f(β1'θ) • Determine the impact of including the group term on the magnitude of the relationship between θ and item responses • Determine size of |(β1-β1')/β1|. If this is large, uniform DIF (confounding) is present

  16. Work still pending • “Optimal” criteria for uniform and non-uniform DIF are unknown • Adjust α for multiple hypotheses? • Effect size for non-uniform DIF? In huge data sets, likely to have a significant interaction term. • What proportional change in β1 is meaningful UDIF?

  17. Also under investigation • What is the role of model fit statistics? For example, if NU DIF is present, the model with group and ability only should not fit. • How important is the proportional odds/Graded response assumption? Should stereotype or other models be used in some instances?

  18. DIFdetect package • Can download from the web • www.alz.washington.edu/DIFDETECT/welcome.html • STATA-based user friendly package

  19. For those who tire of clicking • Difd varlist, ABility(str) GRoups(str) [with lots of optional specifications]

  20. Outline revisited • Brief statistical background • DIFdetect package • What do we do when we find DIF? • New, simpler, faster solutions! • Discussion

  21. What to do when we find DIF? • Educational settings often items with DIF are discarded • Unattractive option for us • Tests are too short as it is; lose variation • Lose precision • DIF doesn’t mean that the item doesn’t measure the underlying construct at all, just that it does so differently in different groups

  22. What do we do – 2 • Need a technique to incorporate items found to have DIF differently than DIF-free items • Precedent for this approach in Reise, Widaman, and Pugh (1993) • Constrain parameters for DIF-free items to be identical across groups • Estimate parameters for items found with DIF separately in appropriate groups

  23. Compensatory DIF • Compensatory DIF occurs when DIF in some items leads to erroneous findings in other items • Both false-positive and false-negative DIF findings

  24. Adjust ability for DIF • Rearrange the data to estimate a DIF-adjusted theta score in PARSCALE • Use that new theta estimate to evaluate for compensatory DIF • Repeat steps 1 and 2 until the same items are identified each time = no compensatory DIF

  25. Rearrange data for PARSCALE

  26. 0001 12XX2 0002 12XX4 0003 01XX3 … 0132 1X2X2 0133 0X1X3 0134 1X2X4 … 0932 0XX22 0933 1XX23 0934 0XX14 … Modified data set

  27. New tools! • Difforpar itemlist, ID(id) RUnname(test0) ABility(ability0) GRoups(group) [with lots of optional specifications] Look at log for lack of convergence, dropped variables, nonsense output, and other warnings.

  28. New tools, continued • run PARSCALE with code_test0.psl • run thetain: thetain origdata origid test0 [merges thetatest0 and sethetatest0 into original data set]

  29. The process continues • Repeat steps 1-3 with the new thetas until the same items come up with DIF • For short lists, you can read the log file • For long lists, examine vars_testN.txt • When finished, you can check Difd.dta for model fit and assumptions

  30. Adjusting for additional groups • mergevirtual origdata originalid [merges itemdata (containing final virtual items) into original data set] • run DIFforPar with the next group, with the new list of some original and virtual items (can copy from vars_testN.txt) and do it all again!

  31. Other tools for Stata • PrePar Writes code and data for Parscale • Syntax: prepar namelist, ID(str) ru() • DIFforSRZ Do file for DIFdetect using SRZ 1-step criteria • Syntax: run difforsrz abil ru • Set variable list, group, criteria, in the do file.

  32. Coming soon • DIFforPar extended for grouped variables with more than 2 categories; continous in Stata, grouped in Parscale. • Samemetric.ado (for now use: prepardata itemlist, ID(str) RUnname(str)).

  33. Have we adjusted for DIF/controlled for confounding? • Can only adjust for measured covariates • Confounders such as education level may mean different things for different groups • Unmeasured confounders • May lack power or data may be too sparse

  34. Adjusted cognitive ability scores • So far our adjusted scores correlate highly with non-adjusted scores. • May contain additional information. • Language DIF

  35. Questions and comments

More Related