380 likes | 509 Views
Factor Analysis, Part 1. BMTRY 726 4/1/14. Uses. Goal : Similar to PCA… describe the covariance of a large set of measured traits using a few linear combinations of underlying latent traits Why : again, similar reasons to PCA (1) Dimension Reduction (use k of p components)
E N D
Factor Analysis, Part 1 BMTRY 726 4/1/14
Uses Goal: Similar to PCA… describe the covariance of a large set of measured traits using a few linear combinations of underlying latent traits Why: again, similar reasons to PCA (1) Dimension Reduction (use k of p components) (2) Remove redundancy/duplication from a set of correlated variables (3) Represent correlated variables with a smaller set of “derived” variables (4) Create “new” factor variables that are independent
For Examples Say we want to define “frailty” in a population of cancer patients We have a concept of what “frailty” is but no direct way to measure it We believe an individual’s frailty has to do with their weight, strength, speed, agility, balance, etc. We therefore want to be able to define frailty as some composite measure of all of these factors…
Key Concepts Fiis a latent underlying variable (i = 1, 2, …, m) X’s are observed variables related to what we think Fimight be ej is the measurement error for Xj, j = 1, 2, …, p ljiare the factor “loadings” for Xj
Orthogonal Factor Model Consider data with p observed variables:
Model Assumptions We must make some serious assumptions… Note, these are very strong assumptions which implies only narrow application These models are also best when p >> m
Model Assumptions Our assumptions can be related back to the variability of our original X’s
Model Terms Decomposition of the variance of Xj The proportion of variance of the jthmeasurement Xjcontributed by the m factors F1, F2, …, Fmis called the jthcommunality
Model Terms Decomposition of the variance of Xj The remaining proportion of the variance of the ith measurement, associated with ei, is called the uniqueness or specific variance Note, we are assuming that the variances and covariances of X can be reconstructed from our pm factor loadings lji and the pspecific variances
Potential Pitfall The problem is, most covariance matrices can not be factored in the manor we have defined for a factor model: LL’ + y For example… Consider data with 3 variables which we are trying to describe with 1 factor
Potential Pitfall We can write our factor model as follows: Using our factor model representation, LL’ + y, we can define the following six equations
Potential Pitfall Use these equations to find the factor loadings and specific variances:
Potential Pitfall However, this results in the following problems:
Limitations • Linearity -Assuming factors = linear combinations of our X’s -Factors unobserved so we can not verify this assumption -If relationship non-linear, the linear combinations may provide a good approximation for only a small range of values • The elements of S described by mp factor loadings in Land p specific variances {yi} -Model most useful for small m, but often mp + p parameters not sufficient and S is not close to LL’+ y
Limitations • Even when m < p, we find L such that X = LL’ + e… but Lis not unique -Suppose T= orthogonal matrix (T’ = T-1 andTT’ = I) -We can use any orthogonal matrix and get the same representation -Thus we have infinitely many possible loadings
Methods of Estimation We need to estimate: We have a random sample from n subjects from a population Measure p attributes for each of the n subjects latent factors L
Methods of Estimation We could also standardize our variables: Methods of Estimation 1. Principal Components method 2. Principal Factor method 3. Maximum likelihood method
Principal Component Method Given S (or S if we have a sample from the population) Consider decomposition
Principal Component Method Problem here is m = p so we want to drop some factors We drop l’s that are small (i.e. stop at lm)
Principal Component Method Estimate L and y by substituting estimate eigenvectors/values for S or R: To make diagonal elements of , , we let
Principal Component Method The optimality of using to approximate S due to: Note, the sum of squared elements is an approximation of the sum of squared error We can also estimate the proportion of the total sample variance due to the jth factor
Example Stock price data consists of n = 100 weekly rates of return on p = 5 variables Data standardized and factor analysis performed in sample correlation matrix R
Example Given the eigenvalues/vectors of R, find the first two factor
Example Given a 2-factor solution, we can find the communalities and specific variances based on our loadings andR.
Example What is the cumulative proportion of variance accounted by factor 1, what about both factors?
Example What about how our model checks out….
Example Two factor (m = 2) solution: Data standardized and factor analysis performed in sample correlation matrix R How might we interpret these factors?
Principal Component Method Estimated loadings on factors do not change as number of factors increases Diagonal elements of S (or R) exactly equal diagonal elements of , but sample covariances may not be exactly reproduced Select number of factors m to make off-diagonal elements small for residual matrix Contribution of the kthfactor to total variance is:
Principal Factor Method Consider the model: Suppose initial estimates available for the communalities or specific variances
Principal Factor Method Apply procedure iteratively 1. Start with 2. Compute factor loadings from eigenvalues/vectors of Rr 3. Compute new values 4. Repeat steps 2 and 3 until algorithm converges Problems: - some eigenvalues Rr can be negative -choice of m (m too large, some communalities > 1 and iteration terminates)
Example Principal factor method m = 2 factor solution:
Maximum Likelihood Method Likelihood function needed and additional assumptions made: Additional restriction specifying unique solution MLE’s are:
Maximum Likelihood Method For m factors: -estimated communalities -proportion of the total sample variance due to kth factor
Example Two factor (m = 2) solution: Data standardized and factor analysis performed in sample correlation matrix R
Large Sample Test for number of factors We want to be able to decide of the number of common factors m we’ve choose in sufficient So if n is large, we do hypothesis testing: We can consider our estimates in our hypothesis statement…
Large Sample Test for number of factors From this we develop a likelihood ratio test:
Test Results What does it mean if we reject the null hypothesis? -Not an adequate number of factors Problem with the test -If n is large and m is small compared to p, this test will very often reject the null -Results is we tend to want to keep in more factors -This can defeat the purpose of factor analysis -exercise caution when using this test