180 likes | 826 Views
Factor Analysis. Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References. Purpose of Factor Analysis.
E N D
Factor Analysis • Purpose of Factor Analysis • Maximum likelihood Factor Analysis • Least-squares • Factor rotation techniques • R commands for factor analysis • References
Purpose of Factor Analysis Factor analysis is one of the techniques to reduce dimension of the observed variables. Suppose that we have p-dimensional continuous variable vector x = (x1,x2,,,xp). We can observe these variables. These may not be real independent underlying variables. Factor analysis seeks to find real underlying variables that are not observable. It means that We want to find m<p dimensional vector – y=(y1,y2,,,ym) of independent variables satisfying conditions:. Where e is normal random vector with 0 mean and constant dispersion. It is assumed that elements of e are independent of each other and y. Moreover it is assumed that elements of y are independent each other and they are standard normal variables. We can write: Where is the diagonal mxm matrix. Elements of this matrix are called specific or unique variances. Weights are factor loadings. Elements of y are called common variables and elements of e are called unique or specific variables.Without loss of generality we will assume that mean of x s 0, i.e. =0. Note that in case of PCA we wanted to find linear combination of observable variables. In the case of factor analysis we want to find independent variables linear combinations of which are observable variables. As it is case in many situations assumption of normal distribution makes treatment easy, although results are applicable to wider range of problems.
Factor analysis model Model defined by the linear equation given above can not be solved directly. We can use the relation between covariance matrix, factor loadings and specific variances. It has the form: Objective of the factor analysis is to determine m (length of the vector y), and using the observed sample estimate of the covariance matrix S. It should be noted that if we have mxm orthogonal matrix M (MTM=I) then for z=My we can write: i.e. solution to the problem is not unique. Solutions are indeterminate up to an orthogonal transformation. Only thing we can do is to estimate the factor space. To be able to find the unique solution we need to add new condition. This condition is: where and D are a diagonal matrices. If we can identify factor space using these constraints then we can use any rotation matrix and define other factors. Moreover we can use even any non-singular matrix and use it to redefine new factors. When we use orthogonal transformation then independent variables go to independent variables. When we use non-orthogonal transformation independent variables may go to dependent variables. Note that if =0 then the second condition cannot be used. It is called Heywood case.
Variance of variables and communalities We can write relations between covariances of original variables and loadings and unique variances The term: is also called communality. That is the variance of the original variable shared with others via common variables. Andii is the unique variances that is property of the variable of xi only.
Maximum number of factors Number of elements in the covariance matrix of p variables is ½p(p+1) (elements of S). Number of elements of loadings is pm, number of specific variances is p. Thus we want identify p(m+1) elements. Number of constraints is ½m(m-1). Taking the constraints into account we want to identify p(m+1)-1/2m(m-1) elements using ½p(p+1) elements. Then we can write relation for the maximum number of identifiable elements: For example if we have 6 original variables we cannot define more than 3 factor variables. If we have 15 original variables we cannot define more than 10 new variables. In practice it is hoped that one can find much smaller number of factors describing the whole system.
Factor Analysis using Maximum likelihood If we use assumptions that n observed variables xi = (xi1,,,xip) are distributed normally then we can write for the likelihood function (assuming that mean of x is 0): We can write for the log-likelihood function: Derivatives wrt to factor loadings and specific variables become: here we used the matrix notation of the derivatives, some facts from matrix algebra and the fact that covariance matrix is symmetric:
Factor analysis using ML The maximum likelihood equations are usually solved iteratively. Care should be taken in implementation of these equations as convergence can be slow and some elements of the specific variables can become negative. These equations are usually solved using Newton-Raphson (NR) second order methods or scoring method (scoring method uses Fisher information matrix instead of the second derivative matrix. It can be slower than NR but has attractive properties that initial values of the parameters can be far from optimal.) Numerical optimisation should also ensure that i>0. Optimisations are usually done using these constraints. Maximum likelihood can be performed in a following way: find initial values for i, then estimate values for and then find new values for i. One of the problems in factor analysis is the common problem in multivariate analysis: It is not guarantied that all measurement are in the same scale. For that reason it is common to use correlation matrices instead of covariance matrices. If factor analysis is done using Maximum likelihood then loadings using correlation matrix can easily derived. In general maximum likelihood estimation is invariant under transformations with non-zero Jacobians. Since transformation from covariance matrix to correlation matrix (and corresponding transformation of loadings and unique variances) has non-zero Jacobian then having found parameters using one of them we can derive another one.
Least-squares for Factor analysis Other widely used technique for factor analysis is the least-squares technique. Its simplicity makes it attractive. It is done by minimisation of: Covariance matrix has the same conditions as before. If we get derivatives and equate to 0 we can derive the following equations: First initial value for is taken and using the first equation is found. For this eigenvalue analysis is used. Then using the second equation is updated. This technique is called principal factor analysis. It should not be confused with principal component analysis. If values of are 0 then the first equation is very similar to principal component analysis. That is the reason why some statistical packages contain PCA as a special case for factor analysis. Two points should be noted: Least-squares are usually used to find initial estimates for ML. If correlation matrix is used then results derived using least squares will be different. Results obtained using covariance and correlation matricess can not be converted into each other using simple scaling as it was the case for the maximum likelihood estimation.
Significance test and model selection If normality assumptions holds then we can use likelihood ratio test for factor with dimension m. If null hypothesis is: and the alternative is that covariance is unconstrained (i.e. null hypothesis is not true) then likelihood ratio test reduces to: Distribution of this is approximated by a chi-squared distribution with ½((p-m)2-(p+m)) degrees of freedom. This enables us to carry out the significance test for null-hypothesis. If maximum number of identifiable parameters is reached we can conclude that it is not straightforward to extract from the given data some structure. Usually n is replaced by n’=n-1-1/6(2p+5)-2/3m. In this case chi-squared approximation is more accurate. This test is called a goodness-of-fit test. For model selection usual techniques used are: First carry out principal component analysis then using one of the recommended techniques (scree plot, proportion of variances etc) select number of factors. Then do factor analysis starting from this value. Likelihood ratio test can be carried out to test significance of the number of factors. But it should applied with care. Likelihood ratio test does not make any adjustments on sequential application of the test. Determining the number of parameters is trade of between the number of parameters (we want to have as little as possible) and goodness-of-fit.
Factor rotations Factor analysis does not give the unique solution. As we noted above using orthonormal rotation we can derive factors that will fit the model with exactly same accuracy. It is usual to rotate factors after analysis. There are several techniques for doing that. All they attempt to minimise some loadings and maximise others so that interpretation of results is easy. Two widely used techniques to derive rotations are varimax and quartimax. Varimax maximises: ’ are loadings after the rotation. Quartimax maximises: Many statistical packages can find rotation matrices using these techniques. R uses varimax only. Sometimes it is useful to find non-orthogonal rotation matrices. One of the techniques is promax available in R. One of the techniques for factor rotation maximises non-normality of the unobserved (common) variables. This technique is an separate technique and it is called Independent component analysis (ICA).
Factor scorings There are also techniques to find factor scores. One technique is due to Bartlett that uses least-squares technique: If we get derivatives of this wrt to y and equate to zero we can get: Another technique uses normality assumption (due to Thomson) and finds conditional expected value of y given x. It turns out to be: Here we assumed that mean values of x-s are 0. Both technique gives score as a linear combination of the initial variables. Ais sometimes called factor score estimation matrix in computer package output.
R commands for factor analyses First decide what data matrix we have and prepare data matrix. Necessary commands for factor analysis are in the package called mva. This package contains many functions for multivariate analysis. First load this package using library(mva) – loads the library mva Now we can analyse data using PCA data(swiss) – loads data fan <- factanal(swiss,2) - It does actual calculations. Second number is the number of factors desired. Have a look help for this command. There are options for rotation and other things fan = factanal(swiss,2,scores=“Bartlett”) – will do factor analysis and calculate scores. varimax(fan$loadings) – perform varimax rotation promax(fan$loadings) – performs promax rotation fan - prints out the result of factor analaysis If covariance matrix has been calculated by some means then it can be used for factor analysis: data (Harman23.cor) fan = factanal(covmat=Harman23.cor,factors=3) It will use factor analysis using the correlation matrix. Obviously scores can not be calculated.
References • Krzanowski WJ and Marriout FHC. (1994) Multivatiate analysis. Vol 2. Kendall’s library of statistics • Morrison DR (1990) Multivatiate statistical methods • Mardia,KV, Kent, JT and Bibby, JM (2003) Multivariate analysis