1 / 47

Using Confirmatory Factor Analysis to Study Measurement Invariance. Roger E. Millsap

Using Confirmatory Factor Analysis to Study Measurement Invariance. Roger E. Millsap Arizona State University Talk given at Hispanic Health Disparities Research Center, UTEP, May 23, 2011.

zev
Download Presentation

Using Confirmatory Factor Analysis to Study Measurement Invariance. Roger E. Millsap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Confirmatory Factor Analysis to Study Measurement Invariance. Roger E. Millsap Arizona State University Talk given at Hispanic Health Disparities Research Center, UTEP, May 23, 2011

  2. Suppose that you have a multiple-item scale, to be given to respondents who can be classified in at least two groups (usually demographic). You want to know if the scale and item scores can be interpreted in the same way for members of different groups. For example, for two people from different groups who are identical on the latent variable that is the target of the scale, do we expect these people to achieve the same scale or item scores, apart from random error? This is the question of measurement invariance.

  3. Four Questions and an Example What is measurement invariance? How is it defined in the context of factor analysis? What should I think about before investigating invariance in my data using factor analysis? How do I use factor analysis to evaluate invariance? Example using confirmatory factor analysis (CFA).

  4. What is measurement invariance? X = test score, questionnaire score, or item score W = latent variable(s) that X is designed to measure V = group membership indicator (e.g., gender) Does the relationship of the score X to W vary depending on V? P(X | W, V) = or P(X | W)?

  5. Example: Gender difference on a depression item Suppose X is the score in an item in a scale measuring depression, W is the latent variable that represents a person’s actual (unknown) depression level, and V is a gender indicator. The invariance question is: For a male and female who are identical on W, do we expect that both persons would have the same probability of scoring any particular score on X? If the answer is “yes” we say that X shows measurement invariance in relation to W and gender. “Same probability” means that we expect that the two persons would have the same item score, but they may not in practice.

  6. Suppose that X is NOT invariant in measurement. Then for a male and female who are identical on the latent depression variable, we expect to see systematic differences in scores on the depression item. In this case, trying to make statements about gender differences in depression will be difficult. Any gender differences found on the depression item will be confounded by the differences due to the lack of invariance. Here we will say that the item lacking invariance is biased in relation to depression and gender.

  7. If the depression item is invariant in measurement, it can still be true that there are gender differences in the distribution of items scores. These differences reflect gender differences on the latent depression variable W, and we expect to find such differences. What is precluded under invariance are gender differences among individuals who are matched on latent depression. How can we match individuals on a latent variable that cannot be directly measured? That is a good question…..

  8. Two Strategies for Evaluating Measurement Invariance 1) Model the latent variable/measured variable relationship. Check to see that relevant model parameters are the same across groups. Examples: factor analysis, item response theory, latent class analysis. 2) Use a measured variable as a “proxy” for W, the latent variable. Match individuals on the proxy and proceed to examine group differences on the test or test item. We won’t pursue this option here, but it is widely used.

  9. How is measurement invariance defined in the factor analysis model? We are most interested in confirmatory factor analysis (CFA), where it is possible to formulate and test specific hypotheses about factor structure. X is now a px1 vector of item or subtest scores. W is an rx1 vector of scores on r common factors, with r < p. V is still an indicator variable that defines the groups. We will use software that can perform multiple-group CFA.

  10. The Factor Model for Multiple Groups Scores on p measures in group k latent measurement intercepts for group k (px1 vector) factor loadings for group k, (pxr matrix) common factor scores in group k unique factor scores in group k

  11. Under the factor model with normality assumptions, we can write If full measurement invariance is to hold, we must have No group differences in intercepts, loadings, or unique variances. This condition is known as strict factorial invariance. Note that invariance in factor means and factor covariances is not required.

  12. What does strict factorial invariance imply? 1) Any systematic group differences in the means on the measured variables are due to group differences on the factors. 2) Any systematic group differences in covariances or correlations among the measured variables are due to group differences on the factors. 3) Any systematic group differences in the regressions of one measured variable on any of the other measured variables are due to group differences on the factors.

  13. Strict factorial invariance is an ideal that may not be achieved. We also have weaker forms of factorial invariance: 1) Metric or Pattern Invariance: The factor loadings are invariant. For all k, 2) Scalar or Strong Invariance: Both loadings and intercepts are invariant. For all k,

  14. Metric Invariance (with no scalar invariance) X W

  15. We may have difficulty even achieving metric invariance. It is sometimes best to start with an even weaker hypothesis of configural invariance that says: 1) There are the same number of factors in each group. 2) The variables that load on each factor are the same in each group (although the loading values may differ). The second point assumes that your model has more than one factor, and that at least some variables are restricted to load on only one factor. Those variables must load on the same factor across groups.

  16. Example: Configural Invariance with r=2 factors Zero elements are in the same locations across groups. Nonzero elements are in the same locations as well, but may differ in value across groups.

  17. Testing for factorial invariance We start with weaker forms of invariance, usually configural invariance. This model is tested for fit to the data. Invariance constraints are added gradually until either strict invariance is achieved, or any further constraints would produce lack of fit. Evaluation of model fit is done using global and local fit indices as in any application of CFA. We will illustrate this process through an example….

  18. What should I think about before investigating invariance in my data using factor analysis? 1) Do you already know the factor structure of the scale in at least one group? Or have you no firm idea? If you have no clear idea about the factor structure, it may be because: --it is a new scale that has not been studied, --it is not a new scale, but there is no clear understanding of the structure in the literature If you have no firm idea, it is probably premature to study invariance at this time.

  19. 2) Are you looking at invariance in items, subtests, or whole tests? The problem with item-level analyses is one of scale, as item response scales are usually discrete with few values. For item-level analyses with items having 4 or fewer response scale points, use a method of CFA appropriate for discrete measures. Another option is to combine item scores into parcels and do the analyses at that level, but this option has weaknesses. The parceling may obscure violations of invariance.

  20. 3) What sample sizes are available? If you have fewer than 100 cases per group, an invariance analysis is probably not worth pursuing, regardless of the number of variables. Many CFA’s for invariance are done with 200-300 per group, and that is usually adequate unless the number of variables is large (e.g., p > 20). CFA’s for discrete measures require larger sample sizes. The ideal there is to have sample sizes of 500 or more per group. Sample sizes do not need to be equal across groups. Sometimes there are large inequalities in sample sizes…

  21. 4) What software will I need? You need software that can perform CFA in multiple-groups, with constraints being placed across groups. Some examples include: LISREL, Mplus, EQS, Amos. If you are going to analyze discrete item data, LISREL and Mplus are the two programs that can do it and are widely available. Mx is another program that can do all of these analyses, but it requires some programming skill. It has been recently integrated into R.

  22. Example: WAIS-R subtests for males and females The Wechsler Adult Intelligence Scale-Revised (WAIS-R) is the most widely used adult intelligence scale. It contains 11 subtests. Our analyses are at the subtest level. Data are from the 1980 standardization sample, with 940 males and 940 females providing data. The LISREL files to be used here can be downloaded from the following website: www.public.asu.edu/~millsap/SAMI.html

  23. Information: general knowledge questions Vocabulary: vocabulary test Comprehension: explain what to do in specific situations or analyze the situation Similarities: describe how pairs of nouns are alike Picture Completion: what is missing from this picture… Picture Arrangement: arrange pictures to tell a story Block Design: arrange colored blocks to match designs

  24. Mean Subtest Scores MalesFemales Info 9.822 8.978 Voca 9.412 9.252 Comp 9.606 9.329 Simi 9.007 8.973 Pcmp 9.151 8.659 Parr 8.967 8.537 Bdes 9.159 8.421

  25. Model one: Two factors with no invariance constraints and minimal identification constraints. Vocabulary subtest forced to load only on factor one. Block Design subtest forced to load only on factor two. All other subtests can load on multiple factors. Two factors permitted to correlate without restrictions. This model tests whether a two-factor model can fit in both males and females.

  26. Bias bk WAISR data: males 2 factor initial model da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 1 lx 2 2 fi lx 7 1 lx 7 2 va 1.0 lx 2 1 lx 7 2 fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off Bias bk WAISR data: females initial model da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 1 lx 2 2 fi lx 7 1 lx 7 2 va 1.0 lx 2 1 lx 7 2 fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off

  27. Model one fit evaluation The fit is not perfect but is fairly good. We conclude that the two-factor model is adequate for both groups. Details: Chi-sq (16) = 32.818 p<.05 RMSEA = .0333 Standardized Root Mean Square Residual (SRMR) males = .0088 females = .0078

  28. Model two: Metric invariance (factor loadings invariant) The factor loadings are constrained to invariance across groups. The loading pattern is the same as in Model one. All subtests except Vocabulary and Block Design are allowed to load on two factors. A different model would be to retain two factors, but force each subtest to load only on one of the two factors. No invariance constraints would be introduced.

  29. Bias bk WAISR data: males 2 factor invar pattern matrix da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 2 fi lx 7 1 fi ph 1 1 ph 2 2 va 1.0 ph 1 1 ph 2 2 fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so Bias bk WAISR data: females 2 factor invar pattern matrix da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=in td=fi,sy ka=frtx=fr ph=fr fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so

  30. Model two fit evaluation Some loss in fit compared to Model One, but the fit is still quite good. We conclude that the factor loadings in the two-factor model are invariant. Details: Chi-sq (26) = 66.744 p<.05 RMSEA = .0409 Standardized Root Mean Square Residual (SRMR) males = .0313 females = .0306

  31. Model three: Changing invariant factor pattern to make each subtest load on only one factor. We are forcing an “independent cluster” structure on the loadings to make: Factor 1—Information, Vocabulary, Comprehension, Similarities Factor 2—Picture Completion, Picture Arrangement, Block Design Factor 1 consists of verbal subtests, and factor 2 contains the performance subtests.

  32. Bias bk WAISR data: males 2 factor invar pattern cluster structure da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 1 2 lx 2 2 lx 3 2 lx 4 2 fi lx 7 1 lx 5 1 lx 6 1 fi ph 1 1 ph 2 2 va 1.0 ph 1 1 ph 2 2 fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so Bias bk WAISR data: females 2 factor invar pattern cluster structure da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=in td=fi,sy ka=frtx=fr ph=fr fi tx 2 tx 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so

  33. Model three fit evaluation Loss of fit is substantial. The results suggest that a pure independent cluster structure is not appropriate here. The Similarities subtest seems to need to load on both factors. Details: Chi-sq (31) = 277.635 p<.05 RMSEA = .0943 Standardized Root Mean Square Residual (SRMR) males = .0511 females = .0478

  34. Model four: Invariant (full) loadings AND invariant intercepts (strong factorial invariance) We return to the original factor structure, allowing subtests to load on both factors except for identification. In addition to invariant loadings, we add invariance constraints on the measurement intercepts. Under this specification, systematic gender differences in subtest means are explained solely by gender differences in the two factors. We need strong invariance to compare means on the subtests across gender.

  35. Bias bk WAISR data: males 2 factor invar pattern, intercepts da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 2 fi lx 7 1 fi ph 1 1 ph 2 2 fi ka 1 ka 2 va 1.0 ph 1 1 ph 2 2 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so Bias bk WAISR data: females 2 factor invar pattern, intercepts da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=in td=fi,sy ka=frtx=in ph=fr fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so

  36. Model four fit evaluation Loss of fit here relative to model two. The results suggest that full invariance constraints on the intercepts are too stringent. The local fit indices suggest that the Information subtest intercepts may not be invariant. Details: Chi-sq (31) = 169.379 p<.05 RMSEA = .0683 Standardized Root Mean Square Residual (SRMR) males = .0292 females = .0336

  37. Model five releases invariance constraint on Information intercept, while retaining all other constraints from model four. This model is an example of partial invariance: Some invariance constraints are removed, while retaining others. In this case, we have partial strong invariance. It is usually difficult to determine which constraints should be removed, especially when there are many variables. Local modification indices can be used to help guide the process of deciding which parameters to free.

  38. Bias bk WAISR data:males invar pattern, intercepts except info da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 2 fi lx 7 1 fi ph 1 1 ph 2 2 fi ka 1 ka 2 va 1.0 ph 1 1 ph 2 2 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so Bias bk WAISR data: females invar pattern, intercepts except info da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=in td=fi,sy ka=frtx=fr ph=fr eqtx 2 tx 1 2 eqtx 3 tx 1 3 eqtx 4 tx 1 4 eqtx 5 tx 1 5 eqtx 6 tx 1 6 eqtx 7 tx 1 7 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so

  39. Model five fit evaluation Fit is much improved, and in fact is as good or better than model two. We conclude that all loadings are invariant, and all intercepts are invariant except for Information. Details: Chi-sq (30) = 88.225 p<.05 RMSEA = .0450 Standardized Root Mean Square Residual (SRMR) males = .0307 females = .0315

  40. Model six adds invariance constraints on the unique factor variances (strict factorial invariance) We retain the constraints from model five, but now add invariance constraints on the unique factor variances. If we had achieved strong invariance, this final step would mean that all systematic group differences in means and covariance structure for the WAIS-R subtests are due to the common factors. As it happens, the best we can achieve is partial strict invariance.

  41. Bias bk WAISR data:males strict invariance except info intercept da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrm.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=fr,fu td=fi,sy ka=frtx=fr fi lx 2 2 fi lx 7 1 fi ph 1 1 ph 2 2 fi ka 1 ka 2 va 1.0 ph 1 1 ph 2 2 fr td 1 1 td 2 2 td 3 3 td 4 4 td 5 5 td 6 6 td 7 7 ou sc rs mi nd = 3 it=500 ad=off so Bias bk WAISR data: females strict invariance except info intercept da ng=2 ni = 12 no =940 ma=cm la sex info digs vocaarit comp simipcmpparrbdesobjadsym rafi = jwaisrf.dat se 2 4 6 7 8 9 10 / mo nx = 7 nk = 2 lx=in td=in ka=frtx=fr ph=fr eqtx 2 tx 1 2 eqtx 3 tx 1 3 eqtx 4 tx 1 4 eqtx 5 tx 1 5 eqtx 6 tx 1 6 eqtx 7 tx 1 7 ou sc rs mi nd = 3 it=500 ad=off so

  42. Model six fit evaluation Some loss of fit, but not much in relation to model five. It appears that the invariance constraints on the unique factor variances do not create any major problems. We conclude that a model with partial strict invariance holds, with the only problem being the Information intercept. Details: Chi-sq (37) = 121.428 p<.05 RMSEA = .0488 Standardized Root Mean Square Residual (SRMR) males = .0341 females = .0373

  43. Invariant factor loading estimates (standardized): F1F2 Information .839 .050 Vocabulary .939 0* Comprehension .704 .150 Similarities .506 .177 Picture Completion .046 .762 Picture Arrangement .050 .699 Block Design 0* .755 *Fixed for identification

  44. A look at the effect size for the Information subtest The gender means for males and females on Information are: Males: 9.822 Females: 8.978 Difference is .844 Intercept estimates on Information are 9.822 and 9.121 for males and females respectively. Difference is .701. The intercept difference explains about 83% of the gender difference in means on Information.

  45. Final Points to Consider: 1) How large is the parameter difference between groups? --effect size measures are important --item level vs scale level effects 2) What should you do if some items are found to lack invariance? --For tests being developed, you could drop those items --For existing tests in wide use, dropping items may not be an option --Could look at impact of bias on intended use of the test. See Millsap & Kwok (2004) Psychological Methods

  46. Two papers on the use of confirmatory factor analysis in studying measurement invariance: Gregorich, S.E. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups? Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44, S78-S94. Wicherts, J.M. & Dolan, C.V. (2010). Measurement invariance in confirmatory factor analysis: An illustration using IQ test performance of minorities. Educational Measurement: Issues and Practice, 29, 39-47.

  47. For a general text on all statistical aspects of the measurement invariance problem see: Millsap, R.E. (2011). Statistical Approaches to Measurement Invariance. New York: Routledge. Email: millsap@asu.edu Phone: 480-965-2584

More Related