220 likes | 306 Views
Multivariate Statistics. Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam. Overview. Digression: The expectation Formal specification Exercise 2 Estimation ULS WLS ML The c 2 -test General confirmatory factor analysis approach. Digression: the expectation.
E N D
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam
Overview • Digression: The expectation • Formal specification • Exercise 2 • Estimation • ULS • WLS • ML • The c2-test • General confirmatory factor analysis approach
Digression: the expectation • If the variables are expressed in deviations from their mean: E(x)=E(y)=0, then: • If the variables are expressed as standard scores:E(x2)=E(y2)=1, then:
Formal specification • The full model in matrix notation: x = Λξ + δ • The variables are expressed in deviations from their means, so: E(x)=E(ξ)=E(δ)=0. • The latent ξ-variables are uncorrelated with the unique components (δ), so: E(ξδ’)=E(δξ’)=0 • On the left side we need the covariance (or correlation) matrix. Hence; • E(xx’) = Σ = E(Λξ+ δ)(Λξ+ δ)’ • Σ is the covariance matrix of the x variables. • Σ = E(Λξ+ δ)(ξ’Λ’ + δ’) • Σ = E(Λξξ’Λ’ + δξ’Λ’ + Λξδ’ + δδ’)
Formal specification • The factor equation: x = Λξ + δ • E(x)=E(ξ)=E(δ)=0 and E(ξδ’)=E(δξ’)=0 • Σ = E(Λξξ’Λ’ + δξ’Λ’ + Λξδ’ + δδ’) • Σ = E(Λξξ’Λ’) + E(δξ’Λ’)+ E(Λξδ’)+ E(δδ’) • Σ = ΛE(ξξ’)Λ’ + Λ’E(δξ’)+ ΛE(ξδ’)+ E(δδ’) • Σ = ΛE(ξξ’)Λ’ + Λ’*0 + Λ*0 + E(δδ’) • Σ = ΛΦΛ’ + Θδ • E(ξξ’) = Φ the variance-covariance matrix of the factors • E(δδ’) = Θδ the variance-covariance matrix of the factors • This is the covariance equation: Σ = ΛΦΛ’ + Θδ • Now relax, and see the powerful possibilities of this equation.
Exercise 2 • Formulate expressions for the variances of and the correlations between the x variables in terms of the parameters of the model • Now via the formal way. • It is assumed that: E(xi)=E(ξi)=0, and E(δiδj)=E(δx)=E(δξ)=0
Exercise 2 • The factor equation is: x = Λξ + δ • The covariance equation then is: Σ = ΛΦΛ’ + Θδ • This provides the required expression.
Exercise 2 • Because both matrices are symmetric, we skip the upper diagonal.
Exercise 2 • Let’s list the variances and covariances.
Exercise 2 • The variances of the x variables: • The covariances between the x variables: • We already assumed that:E(xi)=E(ξi)=E(δi)=0, andE(δiδj)=E(δx)=E(δξ)=0 • If we standardize the variables x and ξ so that:var(xi)=var(ξi)=1, • Then we can write:
Results Exercise 1 ρ12= λ11λ21ρ13= λ11φ21λ32ρ14= λ11φ21λ42ρ23= λ21φ21λ32ρ24= λ21φ21λ42ρ34= λ32λ42 Becomes Exercise 2 Becomes Which is the same result as in the intuitive approach, but using a different notation: φii=var(ξii) and φij=cov(ξij) or when standardized cor(ξij)
Estimation • The model parameters can normally be estimated if the model is identified. • Let’s assume for the sake of simplicity that our variables are standardized, except for the unique components. • The decomposition rules only hold for the population correlations and not for the sample correlations. • Normally, we know only the sample correlations. • It is easily shown that the solution is different for different models. • So an efficient estimation procedure is needed.
Estimation • There are several general principles. • We will discuss: • - the Unweighted Least Squares (ULS) procedure • - the Weighted Least Squares (WLS) procedure. • Both procedures are based on: the residuals between the sample correlations (S) and the expected values of the correlations. • Thus estimation means minimizing the difference between: • The expected values of the correlations are a function of the model parameters, which we found earlier:
ULS Estimation • The ULS procedure suggests to look for the parameter values that minimize the unweighted sum of squared residuals: • Where i is the total number of unique elements of the correlations matrix. • Let’s see what this does for the example used earlier with the four indicators.
ULS Estimation • FULS = • (.42 - λ11λ21)2 + (.56 - λ11λ31)2 + (.35 - λ11λ41)2 + • (.48 - λ21λ31)2 + (.30 - λ21λ41)2 + • (.40 - λ31λ41)2 + • (1 - (λ112 + var(δ11)))2 + (1 - (λ212 + var(δ22)))2 + • (1 - (λ312 + var(δ33)))2 + (1 - (λ412 + var(δ44)))2 • The estimation procedure looks (iteratively) for the values of all the parameters that minimize the function Fuls. • Advantages: • Consistent estimates without distributional assumptions on x’s. • So for large samples ULS is approximately unbiased. • Disadvantages: • There is no statistical test associated with this procedure (RMR). • The estimators are scale dependent.
WLS Estimation • The WLS procedure suggests to look for the parameter values that minimize the weighted sum of squared residuals: • Where i is the total number of unique elements of the correlations matrix. • These weights can be chosen in different ways.
Maximum Likelihood Estimation • The most commonly used procedure, the Maximum Likelihood (ML) estimator, can be specified as a special case of the WLS estimator. • The ML estimator provides standard errors for the parameters and a test statistic for the fit of the model for much smaller samples. • But this estimator is developed under the assumption that the observed variables have a multivariate normal distribution.
The χ2-test • Without a statistical test we don’t know whether our theory holds. • The test statistic t used is the value of the fitting function (FML) at its minimum. • If the model is correct, t is c2 (df) distributed • Normally the model is rejected if t > Ca • where Cais the value of the c2 for which: pr(c2df > Ca) = a • See the appendices in many statistics books. • But, the c2 should not always be trusted, as any other similar test-statistic. • A robust test is to look at: • The residuals, and • The expected parameter change (EPC).
General CF approach • A model is specified with observed and latent variables. • Correlations (covariances) between the observed variables can be expressed in the parameters of the model (decomposition rules). • If the model is identified the parameters can be estimated. • A test of the model can be performed if df > 0. • Eventual misspecifications (unacceptable c2) can be detected. • Corrections in the models can be introduced: adjusting the theory.
Theory Model Reality Data collection process Model modification Data