260 likes | 1.01k Views
Advanced Statistics. Factor Analysis, I I. Last lecture. 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try to confirm) our a priori assumption about this relation?
E N D
Advanced Statistics Factor Analysis, II
Last lecture 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try to confirm) our a priori assumption about this relation? Ad 1. The difference between PCA (principal component analysis) and FA (factor analysis). Ad 2. The difference between EFA (exploratory factor analysis and CFA (confirmatory factor analysis).
PCA and FA extraction Factor loadings (components for PCA) are correlations between factors and variables. For PCA and FA they are extracted on the basis of eigenvectors and eigenvalues associated with this vectors. Eiganvectors V are linear combination of variables to account for the variance measured by the corresponding eigenvalues L (variance refers to factors) Basic equation of extraction is
Extraction methods • Principal component • Principal factors • Image factoring (rescales – unique variances eliminated) • MLF – maximum likelihood factoring (has significance test for factors)
T-F: Practical issues 1. Sample size and missing data N > 300 + Missing data: consider regression imputation 2. Normality Normal distribution: • (1) unimodal, • (2) symmetric (0 skewness), • (3) mesokurtic (not too tall, not to flat) Checking the distribution of xi as for the regression and other analysis.
T-F: Practical issues 3. Linearity. Scaterplots for pairs of xi 4. Absence of outliers among cases 5. Absence of multicolinearity Computation of SMCs. SMC = squared multiple correlation of a variable where it serves as DV and the rest of variables in the analysis are IV. SMC > .9 indicates multicolinearity. SMC = 1 is called singularity.
T-F: Practical issues 6. Factoriability of R. Some r > .3 Recommendation: Kaiser’s ratio: sum of squared correlations divided by sum of squared correlations plus sum of squared partial correlations. Partial correlations should be small, if they are 0 then K-ratio = 1 . K-ratio > .6 is usually required for FA. 7. Outliers among variables Omit variables that have low squared multiple correlation with all other variables initially concidered for FA.
Credits This lecture is partially based on: Melvin Kohn and Kazimierz M. Slomczynski. Social Structure and Self-Direction. Blackwell 1993. IFiS Publishers 2007. Albright, Jeremy J., and Hun Myoung Park. 2009. Confirmatory Factor Analysis Using Amos, LISREL, Mplus, and SAS/STAT CALIS. Working Paper. The University Information Technology Services (UITS) Center for Statistical and Mathematical Computing, Indiana University. Bloomington, IN 47408.
ξ x λ ϕ δ It is common to display confirmatory factor models as path diagrams in which square represent observed variables and circles represent the latent variables. E.g.: Consider two latent variables ξ1andξ2and six observed variables x1 through x6. Factor loadings are represented by λij. Covariance between ξ1andξ2isϕ. The δi incorporate all the variance in xi which is not captured by the common factors.
Equation for X Latent variables are mean centered to have deviations from their means. Under this assumption the confirmatory factor model is summarized by the equation: X = Λ ξ + δ X is the vector of observed variables; Λ (lambda) is the matrix of factor loadings connecting the ξito the xi; ξ is the vector of common factors, and δ is the vector of errors. The error terms have a mean of zero, E(δ) = 0, and common factors and errors are uncorrelated, E(ξδ’)=0.
Specific equation for x1 to x6 X1 = λ11* ξ1 + δ 1 X2 = λ21* ξ1 + δ 2 X3 = λ31* ξ1 + δ 3 X4 = λ42* ξ2 + δ 4 X5 = λ52* ξ2 + δ 5 X6= λ62* ξ2 + δ6
Similarities with regression Equation for xiis a linear function of one or more common factors plus an error term. There is no intercept since the variables are mean centered. The primary difference between these factor equations and regression analysis is that the ξi are unobserved in CFA. Consequently, estimation proceeds in a manner distinct from the conventional approach of regressing each x on the ξi.
Identification One essential step in CFA is determining whether the specified model is identified. If the number of the unknown parameters to be estimated is smaller than the number of pieces of information provided, the model is underidentified. E.g.: 10 = 2x + 3y is not identified (two unknowns but only one piece of information - one equation); a large number of values for x and y makes the equation true: x = -10, y = 10; x = -25, y = 20; x = -40, y = 30, etc. To make it just-identified, another independent equation should be provided; for example, adding 3 = x + y ends up with x=-1 and y=4.
Identification: Input information In CFA, a model is identified if all of the unknown parameters can be rewritten in terms of the variances and covariances of the x variables. In our case, a variance/covariance matrix for variables x1…x6 is: σ61σ62σ63σ64σ65σ66 σ51σ52 σ53σ54σ55 σ41σ42σ43σ44 σ31σ32σ33 σ21σ22 σ11 The number of input information is 6(6+1)/2 = 21
Degrees of freedom Generally the input information is computed as: p(p+1)/2, where p is the number of observed variables. Unknowns: ϕ21, six λij, six δi, and δ63 Degrees of freedom are 21 (knowns) -14 (unknowns) = 7. CFA is over-identified.
Scale of latent variables Without introducing some constraints any confirmatory factor model is not identified. The problem lies in the fact that the latent variables are unobserved and hence their scales are unknown. To identify the model, it therefore becomes necessary to set the metric of the latent variables in some manner. The two most common constraints are to set either the variance of the latent variable or one of its factor loadings to one.
Basic estimation equation When the x variables are measured as deviations from their means it is easy to show that the sample covariance matrix for x, represented by S, can be decomposed as follows: • Σ = ΛΦΛ’ + Θ where Φ (phi) represents the covariance matrix of the ξ factors and Θ (theta) represents the covariance matrix of the unique factors δ). Estimation proceeds by finding the parameters Λ , Φ , and Θ so that predicted x covariance matrix Σ (sigma) is as close to the sample covariance matrix S as possible.
Estimation Several different fitting functions exist for determining the closeness of the implied covariance matrix to the sample covariance matrix, of which maximum likelihood is the most common. A full discussion of the topic in the context of CFA is available in Bollen (1989, chapter 7), including some necessary and sufficient conditions for identification.
ML estimation Maximum Likelihood Method. The method of maximum likelihood (the term first used by Fisher, 1922a) is a general method of estimating parameters of a population by values that maximize the likelihood (L) of a sample.
Fit statistics A goodness-of-fit tests evaluate the model in terms of the fixed parameters used to specify the model, and acceptance or rejection of the model in terms of the overidentifying conditions in the model. Basic assessment: Chi square/degrees of freedom ratio tests the hypothesis that the model is consistent with the pattern of covariation among the observed variables; smaller rather than larger values indicate a good fit. Goodness-of-fit index (GFI): a measure of the relative amount of variances and covariances jointly accounted for by the model; the closer the GFI is to 1.00, the better is the fit of the model to the data.