1 / 22

Multivariate Statistics

Multivariate Statistics. Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam. Overview. Digression: The expectation Formal specification Exercise 2 Estimation ULS WLS ML The c 2 -test General confirmatory factor analysis approach. Digression: the expectation.

Download Presentation

Multivariate Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam

  2. Overview • Digression: The expectation • Formal specification • Exercise 2 • Estimation • ULS • WLS • ML • The c2-test • General confirmatory factor analysis approach

  3. Digression: the expectation • If the variables are expressed in deviations from their mean: E(x)=E(y)=0, then: • If the variables are expressed as standard scores:E(x2)=E(y2)=1, then:

  4. Formal specification • The full model in matrix notation: x = Λξ + δ • The variables are expressed in deviations from their means, so: E(x)=E(ξ)=E(δ)=0. • The latent ξ-variables are uncorrelated with the unique components (δ), so: E(ξδ’)=E(δξ’)=0 • On the left side we need the covariance (or correlation) matrix. Hence; • E(xx’) = Σ = E(Λξ+ δ)(Λξ+ δ)’ • Σ is the covariance matrix of the x variables. • Σ = E(Λξ+ δ)(ξ’Λ’ + δ’) • Σ = E(Λξξ’Λ’ + δξ’Λ’ + Λξδ’ + δδ’)

  5. Formal specification • The factor equation: x = Λξ + δ • E(x)=E(ξ)=E(δ)=0 and E(ξδ’)=E(δξ’)=0 • Σ = E(Λξξ’Λ’ + δξ’Λ’ + Λξδ’ + δδ’) • Σ = E(Λξξ’Λ’) + E(δξ’Λ’)+ E(Λξδ’)+ E(δδ’) • Σ = ΛE(ξξ’)Λ’ + Λ’E(δξ’)+ ΛE(ξδ’)+ E(δδ’) • Σ = ΛE(ξξ’)Λ’ + Λ’*0 + Λ*0 + E(δδ’) • Σ = ΛΦΛ’ + Θδ • E(ξξ’) = Φ the variance-covariance matrix of the factors • E(δδ’) = Θδ the variance-covariance matrix of the factors • This is the covariance equation: Σ = ΛΦΛ’ + Θδ • Now relax, and see the powerful possibilities of this equation.

  6. Exercise 2 • Formulate expressions for the variances of and the correlations between the x variables in terms of the parameters of the model • Now via the formal way. • It is assumed that: E(xi)=E(ξi)=0, and E(δiδj)=E(δx)=E(δξ)=0

  7. Exercise 2 • The factor equation is: x = Λξ + δ • The covariance equation then is: Σ = ΛΦΛ’ + Θδ • This provides the required expression.

  8. Exercise 2

  9. Exercise 2

  10. Exercise 2 • Because both matrices are symmetric, we skip the upper diagonal.

  11. Exercise 2 • Let’s list the variances and covariances.

  12. Exercise 2 • The variances of the x variables: • The covariances between the x variables: • We already assumed that:E(xi)=E(ξi)=E(δi)=0, andE(δiδj)=E(δx)=E(δξ)=0 • If we standardize the variables x and ξ so that:var(xi)=var(ξi)=1, • Then we can write:

  13. Results Exercise 1 ρ12= λ11λ21ρ13= λ11φ21λ32ρ14= λ11φ21λ42ρ23= λ21φ21λ32ρ24= λ21φ21λ42ρ34= λ32λ42 Becomes Exercise 2 Becomes Which is the same result as in the intuitive approach, but using a different notation: φii=var(ξii) and φij=cov(ξij) or when standardized cor(ξij)

  14. Estimation • The model parameters can normally be estimated if the model is identified. • Let’s assume for the sake of simplicity that our variables are standardized, except for the unique components. • The decomposition rules only hold for the population correlations and not for the sample correlations. • Normally, we know only the sample correlations. • It is easily shown that the solution is different for different models. • So an efficient estimation procedure is needed.

  15. Estimation • There are several general principles. • We will discuss: • - the Unweighted Least Squares (ULS) procedure • - the Weighted Least Squares (WLS) procedure. • Both procedures are based on: the residuals between the sample correlations (S) and the expected values of the correlations. • Thus estimation means minimizing the difference between: • The expected values of the correlations are a function of the model parameters, which we found earlier:

  16. ULS Estimation • The ULS procedure suggests to look for the parameter values that minimize the unweighted sum of squared residuals: • Where i is the total number of unique elements of the correlations matrix. • Let’s see what this does for the example used earlier with the four indicators.

  17. ULS Estimation • FULS = • (.42 - λ11λ21)2 + (.56 - λ11λ31)2 + (.35 - λ11λ41)2 + • (.48 - λ21λ31)2 + (.30 - λ21λ41)2 + • (.40 - λ31λ41)2 + • (1 - (λ112 + var(δ11)))2 + (1 - (λ212 + var(δ22)))2 + • (1 - (λ312 + var(δ33)))2 + (1 - (λ412 + var(δ44)))2 • The estimation procedure looks (iteratively) for the values of all the parameters that minimize the function Fuls. • Advantages: • Consistent estimates without distributional assumptions on x’s. • So for large samples ULS is approximately unbiased. • Disadvantages: • There is no statistical test associated with this procedure (RMR). • The estimators are scale dependent.

  18. WLS Estimation • The WLS procedure suggests to look for the parameter values that minimize the weighted sum of squared residuals: • Where i is the total number of unique elements of the correlations matrix. • These weights can be chosen in different ways.

  19. Maximum Likelihood Estimation • The most commonly used procedure, the Maximum Likelihood (ML) estimator, can be specified as a special case of the WLS estimator. • The ML estimator provides standard errors for the parameters and a test statistic for the fit of the model for much smaller samples. • But this estimator is developed under the assumption that the observed variables have a multivariate normal distribution.

  20. The χ2-test • Without a statistical test we don’t know whether our theory holds. • The test statistic t used is the value of the fitting function (FML) at its minimum. • If the model is correct, t is c2 (df) distributed • Normally the model is rejected if t > Ca • where Cais the value of the c2 for which: pr(c2df > Ca) = a • See the appendices in many statistics books. • But, the c2 should not always be trusted, as any other similar test-statistic. • A robust test is to look at: • The residuals, and • The expected parameter change (EPC).

  21. General CF approach • A model is specified with observed and latent variables. • Correlations (covariances) between the observed variables can be expressed in the parameters of the model (decomposition rules). • If the model is identified the parameters can be estimated. • A test of the model can be performed if df > 0. • Eventual misspecifications (unacceptable c2) can be detected. • Corrections in the models can be introduced: adjusting the theory.

  22. Theory Model Reality Data collection process Model modification Data

More Related