Goodness of fit in structural equation models

Goodness of fit in structural equation models Jose M. Cortina Tiffany M. Bludau George Mason University

SEM and Fit • SEM is an analysis technique • We need to know whether data and model are consistent with one another • The assessment of “Model Fit” is the assessment of this consistency

Outline • General discussion of fit • Observed vs. reproduced matrices • Identification • The role of chi-squared • Alternative fit indices • Which to use • Pitfalls

The Two Faces of Fit • The term “Model Fit” is often used to denote overall fit • But assessment of fit comes more directly from consideration of the individual path coefficients and endogenous errors • If the coeffs linking constructs to one another are small, then the data and model are inconsistent!

Overall Fit • As a whole, are the linkages in the model consistent with the relationships among the observed variables? • In one way or another, this is the question addressed by model fit indices • Specifically, fit indices compare observed and reproduced correlation matrices

Reproduced matrices • Same form as the observed matrix • Contains r’s that are implied by the model • Consider a simple mediation model .20 .20 X M Y

X M Y X 1 M .20 1 Y .10 .20 1 X M Y X 1 M .20 1 Y .04 .20 1 Observed vs. Reproduced Lack of fit in this model stems from the discrepancy between the red numbers

Overidentification • The discrepancy is only possible because the model is “overidentified” • There are more knowns (i.e., observed r’s) than unknowns (i.e., coeffs to be estimated) • In this example, there are three knowns and two unknowns • What if we add a third path?

Just identified model .062 .20 .19 Here there are the same number of knowns and unknowns. Observed and reproduced matrices will be identical “Fit” is perfect

Underidentified model In this model, there is one known and there are two unknowns. There are an infinite number of solutions that would reproduce the observed r perfectly

To summarize • In order for a unique solution to exist, a model must be at least just identified • In order for “fit” to be relevant, a model must be overidentified • The closer a model is to being just identified, the better (and less relevant) fit will be

Fit Basics: Chi-squared • Begins with the model R-squared • R2m =1 - (1 - R21)(1 - R22)...(1 - R2E) • This value is computed for both the hypothesized (overidentified) model AND the just identified model • We then use these R-squareds to compute Q=(1-R2saturated)/(1-R2hypo) • X2 = -(N-d)lnQ

Worksheet For the hypothesized or overidentified model: R2var.M = .04, R2var.Y=.04, R2Model = .078 For the saturated or just identified model: R2var.M = .04, R2var.Y=.044, R2Model = .082 Q=(1-.082)/(1-.078)=.9956 Assuming N=101, Χ2 = -(101-1)ln(.9956) = .44 with 1 df

Why isn’t X2 used? • It is a test statistic for badness of fit, and as such, rewards poor designs (i.e, small N) • It is sample size dependent, which means that it doesn’t give effect size information • It does serve as a basis for other indices

What are the alternatives? • There are dozens, among them • NFI = (χ2n - χ2t )/ χ2n or (Fn - Ft)/Fn • PNFI = (dft/dfn)NFIt • GFI = 1 - .5tr(S - Σ)2 • AGFI = 1 - (1 - GFIt) * {[(p+q)(p+q+1)]/2dft} • PGFI = (dft/dfn)GFIt • RMR = SQRT [SUM(S - Σ)2/[(p+q)(p+q+1)]] • IFI = (Fn - Ft)/[Fn – (dft / N-1)]

A few things about that slide • Note that chi-squared appears often • Note the simiplicity of RMR • Note the F’s • Note that df are used to adjust AGFI, PGFI, and PNFI

RMR • RMR stands for Root Mean Squared Residual • It is the square root of the average of the squared differences between the values in the observed and reproduced correlation matrices • The smaller, the better • RMSEA is computed differently, but is very similar and more commonly used • Ranges from 0-1, with .08 being the conventional cutoff

Other indices • For other common indices, the larger, the better • Note the F’s, which stand for Fit Function • For GLS estimators (e.g., ULS, ML, WLS), the fit function is F(Θ) = (S-Σ)’W-1 (S-Σ)

Adjusted fit indices • One problem with most indices is that they reward lack of parsimony • This is true of GFI, for example • The AGFI includes a penalty for lack of parsimony • The PGFI includes a large penalty for lack of parsimony

Other ways to distinguish • Degree of penalty for lack of parsimony is one dimension on which indices differ • There are others • As a set, these dimensions can be used to choose a set of indices that are maximally diagnostic

Tanaka’s (1993) dimensions Population-based/sample-based Parsimony Normed/non-normed Absolute vs. relative Reliance on estimation method Sample size dependence

Theoretical work • A number of theoretical papers, e.g. • Mulaik, James, Van Alstine, & Bennett, 1989 • Medsker, Williams, & Holahan, 1994 • Hu & Bentler, 1999 • Lack of empirical work • What has been done often uses simulated datasets (e.g. Marsh, Balla, & McDonald, 1988)

Little guidance • The literature offers little in the way of guidance with regard to which indices should be reported • Reviewers and editors do no better • So, authors tend to report the indices that are most flattering to their models • We sought to combine Tanaka’s work with empirical work to generate the best set of indices

Specifically • We conducted a meta-analysis of correlations among fit indices • We compiled studies that reported at least two indices, then computed the correlation between each pair of indices • Those indices that are least redundant with other indices offer most unique info

Studies used • Multiple disciplines • Keywords: Structural equation modeling, SEM, covariance structures model, and causal model • Currently have 400+ articles collected • Eliminated articles that: • Were theoretical in nature • Did not report results

Coding of the studies • Two co-authors coded all articles • Coded for: • Discipline, software used, estimation method • Sample size, degrees of freedom • Various fit indices • Coded only the final model

Correlations among indices p < .001,p < .05 RMSEA = Root Mean Square Error of Approximation, NFI = Normed Fit Index, TLI = Tucker-Lewis Index, CFI = Comparative Fit Index, SRMR = Standardized Root Mean Square Residual, GFI = Goodness of Fit, AGFI = Adjusted Goodness of Fit.

Results from factor analysis Ran analysis on 5 indices Dropped the NFI and TLI 2 factor structure 83% of variance r = -.46

Regressions Regressed each dimension onto the remaining dimensions GFI  R2= .91 AGFI  R2= .94 SRMR  R2= .65 RMSEA  R2= .61 CFI  R2= .46

Recommendations • Select one index from each factor • CFI rather than the GFI or AGFI (Bentler & Bonnett, 1980; Marsh, Balla, & McDonald, 1988) • RMSEA or SRMR • CFI and RMSEA • Tanaka’s dimensions • Formulas • Other information

Recommendations cont’d • Our study could only focus on indices that are commonly reported, and parsimony indices are not among them • We would suggest that PGFI or PNFI also be reported

Tanaka’s dimensions RMSEA Population based Accounts for parsimony Normed Absolute Not estimation method specific Sample size dependent CFI Population based Does not account for parsimony Normed Relative Not estimation method specific Sample size dependent

Formulas of the RMSEA and CFI Note the comparative nature of CFI Note how RMSEA does not account for the null model

Other information • CFI • Works well with ML estimation • Works well will small sample sizes • As # of variables in a model increase, tends to worsen • RMSEA • Does not include comparison with a null model • Tends to improve as # of variables increase • Known distribution • CI’s • Stable across estimation methods and sample sizes

Still in progress • Not able to account for all indices in each study coded • Plan to replicate the correlation matrices and generate missing values for coded studies • Reporting tendencies of researchers • Our plan is to show how patterns ACROSS this set of indices are diagnostic of particular plusses and minuses in a model

Pitfalls • Reward for lack of parsimony • Overemphasis on overall fit • Overemphasis on absolute fit • Fit driven by measurement model • Specification searches

Lack of parsimony • Models with few df generate very good values for almost all fit indices regardless of the quality of the model • In such cases, it is better to focus on the individual path coefficients

Overall fit • Regardless of the magnitude of fit indices, individual path coefficients are very important • It is entirely possible to generate good indices for a bad model • For any given data set, there are many very different models that “fit”

Absolute Fit • Knowledge of fit in an absolute sense is helpful but insufficient • Also helps to know how a model compares to alternatives • Relative fit indices help, but generally involve comparisons against a straw man (e.g., the null model) • Better to evaluate hypothesized model against plausible alternatives (e.g., additive model in MSEM)

Decoupling measurement and structural models • Consider the following model Excluding latent variances and correlated errors, there are 21 path coefficients to be estimated. Only 1 of these is part of the structural model

What happens when the ratio of meas. to struct. linkages is large? • Fit is driven largely by the measurement model • Thus, good fit can be achieved even if the the latent vars. are unrelated to one another • Good fit can be impossible even if the latent vars. are strongly related to one another

Anderson & Gerbing • These authors suggested a two step approach • Evaluate the measurement model in the first step (i.e., CFA). • Once a measurement model is settled upon, its values are fixed. • Only then is the structural model evaluated • Fit indices will then give a better picture of the degree to which hypotheses are supported

Specification searches • If the fit of the hypothesized model is inadequate, one can conduct a specification search • This is an attempt to identify the sources of lack of fit • Modification indices are used most often

Modification indices • MIs give the reduction in chi-squared that would be achieved with the addition of a given path • In many models, inferior fit is due to omission of a small number of paths • So, perhaps we should simply add these paths and move on

Not so fast! • Often, the largest MI values are attached to paths for which there is no theoretical basis • A path should only be added if a theoretical case can be made for it, albeit post hoc

What about correlated errors? • Often, the largest MI are attached to paths AMONG errors (i.e., off-diag elements of theta or psi matrices) • There is seldom (but not never) any theoretical basis for these, so they should not be added • Exceptions include errors attached to isomorphic variables separated by time and errors attached to variables that share components

Cross-validation • Regardless of justification, spec. searches are post hoc • If N is adequate, plan for cross-validation • Separate sample into two parts at the outset • Test hypotheses on the larger part • Conduct spec search • Test modified model on the holdout sample • This reduces capitalization on chance

Overall recommendations • Base conclusions on path coefficients as well • Ignore fit for models with few df • Choose fit indices wisely, for yourself and for others! • Beware the pitfalls • Preempt objections to spec search with cross validation • But most important….

Tune in to CARMA!

Goodness of fit in structural equation models

Goodness of fit in structural equation models

Presentation Transcript

Goodness Of Fit

General Structural Equation (LISREL) Models

General Structural Equation (LISREL) Models

G89.2247 Structural Equation Models

Goodness-of-Fit Tests

Goodness of Fit (GoF)

Writing about Structural Equation Models

Variability Indicators in Structural Equation Models

Structural Equation Models

General Structural Equation (LISREL) Models

General Structural Equation (LISREL) Models

GOODNESS OF FIT

ICPSR General Structural Equation Models

ICPSR General Structural Equation Models

Structural Equation Models An Overview

Goodness of Fit

Comparing overall goodness of fit across models

General Structural Equation (LISREL) Models

Goodness of Fit Tests

11.2 Goodness of Fit

General Structural Equation (LISREL) Models