450 likes | 880 Views
Structural Equation Models. Asma Alfadhel Sarah Asio Jimmy(Yuanshan) Cheng 10/10/2013. Outline:. Part I: CFA Part II: SEM SEM Plots Part III: Goodness of Fit. Part I (CFA) - Outline:. CFA Confirmatory Factor Analysis Available R Packages. The Lavaan Package.
E N D
Structural Equation Models Asma Alfadhel Sarah Asio Jimmy(Yuanshan) Cheng 10/10/2013
Outline: • Part I: • CFA • Part II: • SEM • SEM Plots • Part III: • Goodness of Fit
Part I (CFA) - Outline: • CFA • Confirmatory Factor Analysis • Available R Packages. • The Lavaan Package. • Model Description. • Apply CFA and Interpret Results
Confirmatory Factor Analysis vs. EFA • EFA: exploratory • All loadings are free to vary (“L” has no zeros) • Assumption: Cov(F) = I • CFA: driven by theory • The number of factors • Correlations between factors, Cov(F) = ɸ • Which items load onto which factors • CFA allows for the constraint of certain loadings to be zero Diagram
Confirmatory Factor Analysis vs. SEM • SEM: specify the causality between factors • Directed arrows between latent variables • Called the structural model • CFA: no directed arrows between latent factors • Called (the measurement model) • CFA is frequently used as a first step to assess the proposed measurement model in a structural equation model. (wikipedia)
Objective of CFA: • Cov(Y) = L Cov(F) LT+ Ψ • Factors are uncorrelated with error terms, and error terms are uncorrelated • Cov(Y): the covariance of the observed variables • Cov(F) = ɸ, the covariance of the factors • Cov(Y) = L ɸ LT+ Ψ • Ʃ = Ʃ(Ɵ) • (Observed Cov) (Implied Cov) • Try to match the implied covariance with the observed covariance
R Packages for SEM • “SEM” package: developed by John Fox and for along time was the only option in R • “OpenMx” package: developed by Steven Boker. • “lavaan” package: developed by Yves Rossel from the Ghent University in Belgium.
The “lavaan” Package: • lavaan is an R package for latent variable analysis: * • confirmatory factor analysis: function cfa() • structural equation modeling: function sem() • latent curve analysis / growth modeling: function growth() • general mean/covariance structure modeling: function lavaan() • (item response theory (IRT) models) • (latent class + mixture models) • (multilevel models) • More information: • Lavaanwebsite. • lavaan: an R package for structural equation modeling. • Journal of Statistical Software • *http://users.ugent.be/~yrosseel/lavaan/lavaan1.pdf
“cfa” function: • Description: Fit a Confirmatory Factor Analysis (CFA) model. • Usage cfa(model = NULL, data = NULL, meanstructure= "default", fixed.x = "default", orthogonal = FALSE, std.lv = FALSE, std.ov = FALSE, missing = "default", ordered = NULL, sample.cov= NULL, sample.cov.rescale = "default", sample.mean= NULL, sample.nobs = NULL, ridge = 1e-05, group = NULL, group.label= NULL, group.equal = "", group.partial = "", cluster = NULL, constraints = ’’, estimator = "default", likelihood = "default", information = "default", se = "default", test = "default", bootstrap = 1000L, mimic = "default", representation = "default", do.fit= TRUE, control = list(), WLS.V = NULL, NACOV = NULL, start = "default", verbose = FALSE, warn = TRUE, debug = FALSE) • Arguments • model: A description of the user-specified model. • data: An optional data frame containing the observed variables used in the model. • std.lv: If TRUE, the metric of each latent variable is determined by fixing their variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0. • std.ov: If TRUE, all observed variables are standardized before entering the analysis. • Missing: If the data contain missing values, the default behavior is “listwise” deletion. If the missing mechanism is MCAR (missing completely at random) or MAR (missing at random), the lavaan package provides case-wise (or 'full information') maximum likelihood estimation (Set missing = "ML").
Model Description • The dataset was collected by Sarah Asio. • The original model consists of 12 factors and 42 observed indicators. • For simplification a sub-model was used; it consists of 4 factors and 23 observed variables. • The dataset contains a sample of 381 responses from students. • The items range in value from 1 to 6. Team Innovation Team Effort Team Learning Team Communication
Specifying the model: (Symbols) • =~“latent variable definition” • latent variable =~ indicator1 + indicator2 + indicator3 • It define how the latent variables are 'manifested by' a set of observed variables. • The reason why this model syntax is so short, is that the function will take care of several things: • First, by default, the factor loading of the first indicator of a latent variable is fixed to 1, thereby fixing the scale of the latent variable. • Second, residual variances are added automatically. • And third, all exogenous latent variables are correlated by default. • http://lavaan.ugent.be/tutorial/cfa.html
Specifying the model: (Symbols) • ~~“Correlation” --- Correlated with • Residual Variance • Covariance of each latent variable. • ~“Regression” --- Regressed on • This is used in specifying the SEM model.
Specifying the model: • #Specify the model Our.model<- 'CMM =~ CM9 + CM10 + CM11 + CM12 + CM13 EFF =~ EF14 + EF15 + EF16 +EF17 LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24 INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42' fit <- cfa(Our.model, data=MyData) summary(fit, fit.measures=T) Syntax
Missing values, Standardization, & R2 • fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE, missing = "ML") • summary(fit, fit.measures=T, rsq=T) • OR • Inspect(fit, "rsquare") (no round off) • fit <- cfa(Our.model, data=MyData, missing = "ML") • summary(fit, standardized = TRUE, rsq =TRUE)
2st Output • fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE, missing = "ML") • Inspect(fit, "rsquare")
CFA Syntax in “lavaan” vs “sem” • install.packages("semPlot") • Lavaan.model <- semSyntax(fit, "lavaan") • Sem.model <- semSyntax(fit, “sem") • Output: • BACK
CFA vs. EFA • Back
Part II Outline • SEM process - Overview • SEM Measurement models • SEM Path diagram - Overview • R-Code for: • SEM model specification • SEM model fitting • SEM Path Diagram • Outputs for SEM model and path diagram
STRUCTURAL Equations Modeling (SEM) process Notes: SEM vs CFA “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
SEM Measurement models • Endogenous measurement model: • Y = ByZ + ey • Here: • Y is an (ny x1) matrix of endogenous indicators, • By is an (nyxq) matrix of coefficients from the endogenous variable to endogenous indicators, • Z is a (qx1) matrix of endogenous latent variable(s), • ey is a (nyx1) matrix for error associated with the endogenous indicators. • Exogenous measurement model: • X = BxU + ex • Here: • X is an (nx x1) matrix of exogenous indicators, • Bx is an (nx xp) matrix of coefficients from the exogenous variables to exogenous indicators, • U is a (px1) matrix of exogenous latent variable(s), • ex is a (nx x1) matrix for error associated with the exogenous indicators. “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
Overall SEM Measurement & Structural models • SEM model for the case study: • Z = BzU + ez • Here: • Z is the endogenous variable, • U is a (3x1) matrix of exogenous latent variable(s), • Bz is a (1x3) matrix of coefficients of exogenous variables, • ez is the error associated with the endogenous variable. + + + “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf
Matrix representation for SEM measurement models X = BxU + ex Y = ByZ + ey Z = BsU + es Notes: CFA vs EFA
SEM Path diagram - Overview • A path diagram is a graphical representation of the hypothesized relationships between the variables. • Exogenous – emanates arrow (analogous to independent variables). • communication, effort and learning • Endogenous – receives arrow (analogous to dependent variables). • innovation and measures • Other variables are error terms which account for random or measurement error for endogenous variables. http://en.wikipedia.org/wiki/Structural_equation_modeling
Path Diagram Node representations http://people.ucsc.edu/~zurbrigg/psy214b/09SEM3a.pdf
R-Code for SEM model specification • #Specify the model • Our.model <- ‘ • CMM =~ CM9 + CM10 + CM11 + CM12 + CM13 • EFF =~ EF14 + EF15 + EF16 +EF17 • LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24 • INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42 • INN ~ CMM + EFF + LN’ • #Install the lavaan package • install.packages("lavaan") • require("lavaan")
R-Code for SEM model fitting • # Fit SEM model using standardized data • fit <- lavaan ::: sem(Our.model, data=SEMdata, std.lv=TRUE, std.ov = T, missing = "ML") • summary(fit, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) • Syntax definitions: • std.lv: If TRUE, the metric of each latent variable is determined by fixing their variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0. • std.ov: If TRUE, all observed variables are standardized before entering the analysis. • Missing: If "listwise", cases with missing values are removed listwise from the data • frame before analysis. If "direct" or "ml" or "fiml" and the estimator is maximum likelihood, Full Information Maximum Likelihood (FIML) estimation is used using all available data in the data frame. • http://cran.r-project.org/web/packages/lavaan/lavaan.pdf
R-Code for SEMS Path Diagram • #Install semPlot package • install.packages("semPlot") • require("semPlot") • # Plot input path diagram • semPaths(fit,title=FALSE, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE) • # Plot output path diagram with standardized parameters • semPaths(fit, "std”, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE) • For more options and Syntax definitions, refer to: • http://cran.r-project.org/web/packages/semPlot/semPlot.pdf
Part III (Goodness of fit) - Outline • Introduction to fit indices • Using R to show these indices • Modification indices
Goodness of fit • Model fit: “how the model that best represents the data reflects underlying theory” • Population covariance matrix (∑) Matches Implied covariance matrix (∑(θ) ) • So far not yet an agreement on • Which indices to use • Cut-offs for various indices • Hopper et. al (2008)
Overview of Indices • Hopper et. al (2008)
Benchmarks Summary • Hopper et. al (2008)
Reporting Strategy • Not necessary to report all • Do not choose to report only the good ones • CFI, GFI, NFI, and NNFI are most commonly reported (McDonald and Ho 2002) Hopper et. al (2008)
Reporting Strategy • Hopper et al (2008) • Chi-Square, df, p-value • RMSEA, SRMR, CFI and one parsimony fit index • Two-index presentation strategy (Hu and Bentler, 1999) • TLI and SRMR • RMSEA and SRMR • CFI and SRMR
Modification indices • To improve the model fit by freeing fixed parameters • CFA is structured by theory • One factor only measures certain but not all observable measures • Parameters assumed to be zeros • Assumed zero error correlations • Just practical standard (Westfall et. al, 2012) Wikipedia
Freeing fixed parameters F2 F1 X2 X4 X1 X3 e1 e2 e3 e4
Modification Indices • Don’t allow modification indices to drive to process • Any modification should make theoretical sense • Good practice to assess the fit • Hopper et. al (2008)