300 likes | 1.71k Views
Differential Item Functioning. Differential item functioning (DIF) occurs when people from different groups (e.g gender or ethnicity) with the same underlying latent trait score have a different probability of responding to an item in a particular way. Group differences in item responses (or on latent variables) do not reflect DIF per se (e.g females score higher than males on a particular item or scale). DIF is only present if people from different groups with the same underlying ab33437
E N D
1. 1 Differential Item Functioning in Mplus Summer School
Week 2
2. Differential Item Functioning Differential item functioning (DIF) occurs when people from different groups (e.g gender or ethnicity) with the same underlying latent trait score have a different probability of responding to an item in a particular way.
Group differences in item responses (or on latent variables) do not reflect DIF per se (e.g females score higher than males on a particular item or scale).
DIF is only present if people from different groups with the same underlying ability (or trait level) have a different probability of response.
Reise, Widaman, Pugh 1993; Psych Bulletin, Vol 114, 3 552-566
Embretson,S.E., Reise,S.P. (2000). Item Response Theory for Psychologists.
3. DIF – Measurement Non-Invariance
If the probablity of item response is the same (among different sub-groups with the same underlying ability) measurement invariance is assumed
If the probablity of response is different (among different sub-groups with the same underlying ability) than measurement non-invariance is assumed.
4. Types of DIF Uniform: DIF occurs uniformly at all levels along the latent trait
Non-Uniform : DIF does not occur equally at all points on the latent trait (e.g. gender differences in response) may only be evident at high or low levels of the construct
Crane et al (2004) describe uniform DIF to be analogous to ‘confounding’ in epidemiology and non-uniform DIF with ‘effect modification’ – i.e. interaction between trait level, group assignment and item responses
5. Example of Item with Uniform DIF
6. Example of Item with Non-Uniform DIF
7. 7 Definition of DIF (Mellenbergh, 1989)
8. 8 Definition of DIF (Mellenbergh, 1989)
9. Definition of DIF (Mellenbergh, 1989)
10. Differential Item Functioning Important first step in the evaluation of test bias
For construct validity items of a scale ideally should have little or no DIF
Items should function in the same way across subgroups of respondents who have the same underlying ability (or level on the latent trait)
Presence of DIF may compromise comparison across subgroups – give misleading results
Confound interpretation of observed variables
11. Methods to identify DIF Parametric
Mantel-Haenszel (MH) (Holland & Thayer, 1988)
Non-parametric methods
Logistic regression (Zumbo, 1999)
Ordinal logistic regression (Crane et al, 2004)
MIMIC models (Muthen, 2004)
Multiple group models
IRT based methods (Thissen, 1991)
Good review by Teresi (2006) Medical Care Vol 44
12. What to do if DIF present Remove items?
1) Ok if you have a large item pool and the item can be replaced with a item measuring similar threshold / discrimination parameters
2) But dropping items might adversely affect the content validity of the instrument.
3) May end up with an instrument that is not comparable to other research using that instrument
Look for causes of DIF
What do all the DIF items have in common e.g.
Are they all negatively or positively worded
Are they all at end of study
Readability etc
How do they differ from the invariant items?
13. How to adjust for DIF Adjust for DIF in the model – in Mplus can do this by adding direct effect between the covariate and the item
Crane et al (2004, 2006)
a) items without DIF have item parameters estimated from whole sample – (anchors)
b) items with DIF have parameters estimated separately in different subgroups
)
)
14. Two Examples of Identifying DIF Mplus :
MIMIC Model (Multiple Indicators, Multiple Causes) Uniform DIF
Stata:
DIFd program (Crane et al, 2004) Non-Uniform and Uniform DIF
15. Mplus Example - MIMIC Model:BCS70 Externalising (Conduct) Scale 03 Teenager often destroys belongings
04 Teenager frequently fights with others
10 Teenager sometimes takes others' things
14 Teenager is often disobedient
18 Teenager often tells lies
19 Teenager bullies others
Mother’s rating of teenager on Rutter Scale age 16
Ordinal 3 category scale (0=does not apply, 1=applies somewhat, 2=certainly applies)
16. CFA Model for BCS70 Externalising
17. Mimic Model Stages of identifying potential DIF 1. Run CFA model without covariates
18. Stage 1-3: Mplus CFA USEVARIABLES are rut03 rut04 rut10 rut14 rut18 sex;
CATEGORICAL are rut03 rut04 rut10 rut14 rut18;
Missing are all ( 88 999 );
ANALYSIS:
ESTIMATOR IS wlsmv;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
MODEL:
CONDUCT by rut03 rut04 rut10 rut14 rut18; ! (define latent variable)
OUTPUT: SAMPSTAT STANDARDIZED RES MOD(10) ;
19. CFA Mimic Model
20. Check MOD indices
M.I. E.P.C. Std E.P.C. StdYX E.P.C.
ON Statements
RUT03 ON SEX 82.578 -0.354 -0.354 -0.176
RUT04 ON SEX 23.839 0.143 0.143 0.071
21. Stage 4: Mplus MIMIC DIF USEVARIABLES are rut03 rut04 rut10 rut14 rut18 sex;
CATEGORICAL are rut03 rut04 rut10 rut14 rut18;
Missing are all ( 88 999 );
ANALYSIS:
ESTIMATOR IS wlsmv;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
MODEL:
CONDUCT by rut03 rut04 rut10 rut14 rut18; ! (define latent variable)
RUT04 ON sex ; ! MI in run 4b 15.221 so may not be required ??
OUTPUT: SAMPSTAT STANDARDIZED RES MOD(10) ;
22. CFA Mimic Model (DIF)
23. CFA Mimic model fit
24. Mplus results Model (2) Initial Mimic Model (no direct effects)
Estimate S.E. Est./S.E. P-Value Std
CONDUCT ON SEX -0.126 0.022 -5.789 0.000 -0.169
Model 4(b) Add 2 direct effects
Estimate S.E. Est./S.E. P-Value Std
CONDUCT ON SEX -0.113 0.022 -5.203 0.000 -0.152
RUT03 ON SEX -0.336 0.044 -7.597 0.000 -0.336
RUT04 ON SEX 0.112 0.032 3.481 0.000 0.112
25. Mplus results Model (2) Initial Mimic Model (no direct effects)
Estimate S.E. Est./S.E. P-Value Std
CONDUCT ON SEX -0.126 0.022 -5.789 0.000 -0.169
Model 4(b) Add 2 direct effects
Estimate S.E. Est./S.E. P-Value Std
CONDUCT ON SEX -0.113 0.022 -5.203 0.000 -0.152
RUT03 ON SEX -0.336 0.044 -7.597 0.000 -0.336
RUT04 ON SEX 0.112 0.032 3.481 0.000 0.112
26. In a Graded Response Model...
27. In a Graded Response Model...
28. In a Graded Response Model...
29. Exercise Work through MIMIC modelling stages...
...using multivariate probit regression model implemented by WLSMV (equivalent to normal ogive IRT model for polyomous items)
...using the Graded Response Model implemented by full-information maximum likelihood (MLR)