1 / 36

What is Factor Analysis?

What is Factor Analysis?. It explains the inter-relationships among a large number of variables in terms of their common underlying factors. It is a data reduction technique. Use it when you want to obtain the underlying structure of the inter-relationships.

manelin
Download Presentation

What is Factor Analysis?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is Factor Analysis? • It explains the inter-relationships among a large number of variables in terms of their common underlying factors. • It is a data reduction technique. • Use it when you want to obtain the underlying structure of the inter-relationships.

  2. Seven-Stages operations of Factor Analysis. • 1. Define the objectives: • Identification of structural relationships among variables. • Obtain “representative” variables. • Create a “new” set of variables.

  3. Stage 2 • Designing Factor Analysis • Create the correlation matrix among variables using • Original data matrix [called R type factor analysis] • Create the “input” data matrix from the correlations among “sample units” [This is called Q type factor analysis]. • The ratio: [# sampled units/#variables] should be greater than 5 for factor analysis.

  4. Stage 3 • Check out the validity of assumptions: • Variables should be related to have meaningful “common factors”. • Sample units must be homogenous. • Data must be metric. Dummy variables are allowed. • “Multivariate normality” is required only for doing hypotheses testing. • The correlations among variables must be at least 0.3 in their absolute values.

  5. Stage 4 • Select one of two extraction methods: • Principal component factor analysis [it uses a significant amount of explained variations]. • Common factor analysis [it uses the common variance and places communality estimates on the diagonal of the correlation matrix]. • Determine how many factors to consider using: • Latent root criterion by specifying the threshold values for eigenvalues • Percentage of variation to be explained • Scree test criterion [it plots eigenvalues of each factor in terms of the number of factors]. The point where the plot becomes flat is the appropriate number of factors.

  6. Stage 5 • “The rotation” redistributes the variance to explain well the simple structures in the factor matrix. • There two rotation methods: • Orthogonal rotations called quartimax [to cluster the sampled units], varimax [to cluster the variables], equimax [to compromise between varimax and quartimax]. • Oblique rotation methods: • SAS uses “promax” • SPSS uses “Oblimin” • Factor loadings should be at least 0.30 in absolute value.

  7. Stage 6. • Validating the results • Use confirmatory factor analysis to check out the “replications” • Factor structure should be stable • Identify the impact of outliers by analyzing with and without outliers

  8. Stage 7 • Interpret the results • Use the knowledge in future studies • Develop “surrogate” variables as a simple summary of several correlated variables.

  9. Principal components An extension of regression ideas to determine the relationship among variables Multivariate method Factor analysis details Use when some variables are observed and others are latent Multivariate method Factor Analysis

  10. We can rotate the x and y axes so that highest variability in the data occurs in the first principal axis and the next high variability occurs in the second principal axis which is orthogonal to the first axis. The new data values are x* and y* which are linear combinations of old x and y values. Find standard deviation of x* and y* for a selected angle of rotation. The graph of such standard deviation curve helps to identify the optimal angle. Some Facts of Principal Components

  11. Basics • First principal component is

  12. Variability explained

  13. Variability of k-th Principal Component

  14. Percent of variability explained by the k-th Principal Component

  15. Distributionality

  16. Confidence Interval

  17. Correlations

  18. Correlation matrix

  19. Rationale in examining correlation matrix above • By examining the correlations of the variables with a principal component, we can find the variable which contributes most to a principal component as its correlation is the highest. • By searching for the highest correlation among the correlations of a variable with the principal components, we know which variable causes high overall variability in the data.

  20. Factor Analysis • This is also a regression technique of finding relations among observed and latent [non-observable] variables using their correlation structure. • The unobserved variables are called “factors”.

  21. Factor model

  22. Assumptions of the Factor Model • Factors are standardized to have zero mean and unit variance. • Factors are uncorrelated with each other and also with the noise. • The noises have zero mean and uncorrelated with each other and the unobserved factors and may have different variability. • It is important that the number of factors, k is less than the number of observations, p.

  23. Assumptions of Noises

  24. Factor Loadings

  25. Assumptions of Observed

  26. Residual Correlation • It is the difference between observed correlation and fitted correlation in the factor analysis • A Rule: when the residual correlation is less than 0.1 in absolute value, the correlation has been well explained. • When the residual correlations are significantly large, consider including more factors.

  27. An Interpretation • The sum of the squares of the loadings for a factor is the proportion of the total observed variances (in the data of X ) that is explained by the factors.

  28. Estimation methods • Unknowns to be estimated are factor loadings and the specificities. • Methods of estimation: • MLE [gives best fit] • Varimax method [not to be used when general factors are there] • Quartimax method [use when one factor has large loadings and there are not many factors of that type]

  29. Oblique factors • When two factors are NOT independent, they are called “oblique factors”.

  30. How many factors to be considered? • There is NO universal rule. • When the residual correlations are significant, consider more factors • Do test of significance as in the next two slides. • The number of factors to be considered should be more than the number v’s greater than or equal to one. • Use Scree graph. The number of values above the base line in the graph is the number of factors to be considered.

  31. Hypothesis Testing method

  32. Idea of finding the number of factors • 1. Varimax method uses the idea of maximizing the sum of the variances of the square of the loadings. • 2. Quartimax method uses the idea of maximum possible loadings.

  33. Some Notations

  34. # Factors in Varimax Method

  35. # factors to be considered in Quartimax Method

  36. An Example

More Related