300 likes | 500 Views
Orthogonal Factor Analysis Subject to Direct Sparseness Constraint on Loadings. Kohei Adachi Osaka University , Japan. Nickolay T. Trendafilov The Open University , UK. 1. Introduction.
E N D
OrthogonalFactor Analysis Subject to Direct Sparseness Constraint on Loadings Kohei Adachi Osaka University, Japan Nickolay T. Trendafilov The Open University, UK
1. Introduction Starting with the FA model, we introduce Sparse Othogonal FA as a procedure for overcoming the problem of Confirmatory FA, with five slides. 1.1. FA Model 1.2. Problem of CFA 1.3. Automatic CFA by SOFA 1.4. Differences to Sparse PCA 1.5. Remaining Parts
1.1. FA (Factor Analysis) model FA model with m factors can be written as common factorsunique factors loadings (diag) unique variances X F+U2 npnmp npp for standardized n-obsp-var data matrix X. The aim of FA is to estimate , , (factor corrlations) FAis classified into EFA (exploratory FA) without an constraint and CFA (Confirmatory FA) in which someloadings in is constrained to be zero.
Var.1 Var.2 Var.3 Var.4 Var.5 Fac.1 Fac.2 1.2. Problem of CFA A CFA model is illustrated in this path diagram corresponding to = 31 42 11 22 52 41 where the pairs of Var & Fac with nonzero loadings are linked. A problem of CFA is that its users must specify a priori the constraints on , i.e., how variables are linked to factors. To deal with this problem, we propose a procedure for computationally identifying the optimal CFA model among all possible models with = I (identity).
1.3. Automatic CFA by SOFA We call our proposed procedure SOFA abbreviating Sparse Orthogonal FA, as it seeks sparseincluding zero loadings and = I is assumed. Let use SP() for the sparseness of (i.e., the number of zero loadings). Then, SOFA is formulated as: SOFA: [A] Min,f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin…qmax to select the best q SOFA allows us to find the optimal orthogonal CFAmodel among all possible ones.
1.4. Differences to Sparse PCA First X F+U2 SOFA is based on FA model not on PCA model without2 X F Second In SOFA, sparseness is directly constrained as Min,f(,) s.t. SP() = an integer q without using Penalty, in contrast to the existing sparse PCA formulated as MinfPCA() + Penalty() over
1.5. Organization of Remaining Parts SOFA: [A] Min,f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin…qmax to select the best q 2Loss Functionintroducef(,) 3Algorithm describe [A] 4Sparseness Selectiondescribe [B] 5Simulation Study 6Examples 7Discussion
2. Loss Function We present the loss function to be minimized and formulate SOFA. 2.1. What Function is Selected? 2.2. Selected Function 2.3. Formulation of SOFA
2.1. What Function is Selected? FA is formulated with some types of loss functions. Among them, we select a function that can be rewritten as irrelevant to const > 0 given matrix f(,) = h() + c A2 = = (ij) (aij) SP() = q This minimization over s.t. is easily attained by ij =
2.2. Function Selected As such a function, we select f(F,U,,) = X (F+U)2 (1) (de Leeuw, 2004; Unkel & Trendafilov, 2011) which can be written in the form f(F,U,,) = h() + n A2 = = X (FA+U)2 n1XF Though (1) is a function of F,U,,, we show that (1) can be minimized only with the update of , later.
2.3. Formulation of SOFA So, our proposed SOFA is formulated as Min f(F,U,,) = X (F+U)2 subject to SP() = q Sparseness Constraint FF = nIm Orthogonal Common Factors UU= nIp, Orthogonal unique Factors FU = Om×pOrthogo. common vs unique
3. Algorithm We detail the algorithm for SOFA. 3.1. Overview 3.2. Update of Λ and Ψ 3.3. Update of n1XZ (1) 3.4. Update of n1XZ (2) 3.5. Whole Algorithm 3.6. Multiple Starts
3.1. Overview To minimize Min f(F,U,,) = X (F+U)2 we consider an ALS algorithm in which ,, Z = [F,U]are alternately updated, with common/unique factors combined in n(m+p) Z = [F,U]. However, in Slide 3.3, we show no need of updating Zand further no need of data matrix of Xif covariance matrixS = n1XX is available.
3.2. Update of , MinX (F+U)2 with F,,Ufixed is attained by= diag(n1XU) (1) MinX (F+U)2s.t. SP() = q with F,,Ufixed Remember, rewritten as h()+nA2 and is obtained from A=n1XF (2) Note; (1) and (2) show ,A<=n1X[F,U] Z
3.3. Update of n1XZ (1) We use 2 slides to show how n1XZis updated. [F,U] [,] Our task is MinZ X(F+U)2 = X ZB2 s.t. n1ZZ = Im+p FF=nIm ,UU=nIp FU = O summarize attained using SVD n1/2XB = = P1D1Q1 Z=n1/2P1Q1+n1/2P2Q2 for being not unique, but n1XZis unique as next
3.4. Update of n1XZ (2) The two equations n1/2X= n1/2XBB+= P1D1 Q1B+ Z =n1/2P1Q1+n1/2P2Q2 imply the matrix giving , is rewritten as n1XZ = (P1D1Q1B+)(P1Q1+P2Q2) = B+Q1D1Q1 which can be obtained from EVD:BSB = Q1D12Q1 derived from SVD = n1XX sample cavariance matrix
3.5. Whole Algorithm X(F+U)2 = X(FA+U)2 + n A2 monotonically decreases with the following algorithm: 1 Initialize B = [,] randomly 2 Perform EVD BSB = Q1D12Q1 3 Obtain B+Q1D1Q1 4 Update 5 Obtain A to update 6 Finish, or back to 2 with B = [,] Here, we find that SOFA only needs S=n1XX
3.6. Multiple Runs SOFA is sensitive to local minima. So, we take the following multiple runs procedure: 1 We run the algorithm 50 times with different starts and find the two equivalent solutions with the lowest loss function values. 2 If such solutions are found, we finish with selecting them as the optimal ones; otherwise, go to 3. 3 We further run the algorithm with different starts, until the two equivalent solutions with the lowest loss function values.
4. Sparseness Selection We present our sparseness selection procedure with just one slide. 4.1. Selection using BIC
3.5. Whole Algorithm SOFA: [A] Min,f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin…qmax to select the best q In the last section, we described [A]. For [B], we use BIC expressed as BIC(q) 2log-likelihood q log n That is, [B] is formulated as Best q = argmin BIC(q) over q = qmin…qmax We empirically found SOFA solutions were almost equivalent to ML ones, which validate using ML-based BIC for LS-based SOFA solutions.
5. Simulation Studies We briefly report a simulation study whose purpose is to assess how well the true sparseness and parameters are recovered by SOFA. 5.1. True Parameters 5.2. Results
5.1. True Parameters We synthesized the true 40 which had one of the five structures: Simple Structure Bi-factor Structure A “?” cell had 0 or a non-zero randomly. The resulting , gave 200 (= 40 5) correlation matrices to be analyzed by SOFA.
Median Worst 5% Rate of correctly identified zeros Rate of correctly identified non-zeros 5.2. Recovery The resulting medians and worst 5 percentiles of indices values among 200 solutions are shown here. 1 2 3 4 5 1: True sparseness were selected well by BIC. 2,3: True structures were recovered well. 4,5: True parameter values were recovered well.
6. Examples We illustrate SOFA with the two famous data sets which have often been used for testing FA procedures. 6.1. Box Problem Data 6.2. Twenty-four Psy Test Data
6.1. Box Problem Data The first example is the 3 factor solution for the 400 20 box data matrix generated following Thurstone (1940). BIC was the lowest for q = 27, and the corresponding solution is shown right, where we find the exact simple structure.
6.2. Twenty-four Psy Test Data The second is the 4 factor solution for 24 psychol test data. BIC was the lowest for q = 35, and the corresponding solution is shown right. The loadings showed the bi-factor structure matched to the ones found in the previous studies using EFA and CFA.
7. Discussions After summarizing SOFA, we discuss its advantages over the existing CFA and EFA. 7.1. Summary 7.2. SOFA vs CFA 7.3. SOFA vs EFA (Rotation)
7.1. Summary We propose SOFA formulated as [A] Min,f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin…qmax to select the best q For [A] we developed the ALS algorithm for minimizing X (F+U)2 s.t. SP() = q, [F,U][F,U] = I, which can be attained only if sample covariances are available. For [B] we propose to select sparsenessq using BIC. Numerical studies demonstrated SOFA successfully select q, obtain sparse structure in and estimate ,.
7.2. SOFA vs CFA SOFA overcomes the problem of CFA that the locations of zero loadings must be specified by users: SOFA computationally find the optimal CFA model. But, SOFA solutions are restricted to orthogonal ones. So, oblique version of SOFA remains to be considered in future studies.
7.2. SOFA vs EFA (Rotation) As compared to SOFA, two drawbacks are found in EFA, in which loading matrix 0 is rotated so that the resulting 0T has quasi-sparse structure. This term implies that 0T cannot include exact zero loadings [1] The users must resort to view some loadings as approximately zeros, which is subjective and tandem. [2] Rotation does not involve the original data, i.e., the function of only 0T is optimized. in contrast to SOFA in which FA model with sparseness constraint is optimally fitted to data for finding the sparse structure underlying the data.