Education 795 Class Notes

Education 795 Class Notes Factor Analysis Note set 6

Today’s agenda • Announcements (ours and yours) • Q/A • Introducing factor analysis

History and general goals • Attempts to represent a set of measured variables in terms of a smaller number of hypothetical constructs • Spearman (1904) invented factor analysis as a way of studying correlations between mental test scores – the G-factor • Modern uses • Data reduction (tends toward EFA) • Identification of latent structure (tends toward CFA)

Practice of Factor Analysis • Differs a bit from other techniques, in that a lot of judgment calls are required • Rules of thumb abound • There are, however, general standards of practice • Iteration is usually required

Courtesy of Pedhazur • Gould (1981) characterized factor analysis as a real pain, “although we think a more apt description is that of a forest in which one can get lost in no time” (p. 590) • GIGO: garbage in, garbage out. The theoretical underpinnings of the latent variables is key.

Correlational Technique • We try to identify dimensions that underlie relations among a set of observed variables • Factor analysis is applied to the correlations among variables • Correlation matrix is a square matrix (equal number of rows and columns) • No matter how many subjects, correlation matrix has as many rows as variables

Correlation Matrix • It is a good idea to do a thorough examination of the correlation matrix before beginning a FA • You can compute a FA using either raw data OR just the correlation matrix • Appropriate data types include anything for which a correlation can be properly computed (continuous, interval, ordinal, dichotomous only nominal)

Bartlett’s Test of Sphericity • A necessary but not sufficient piece of evidence that the correlation matrix is appropriate for FA • Null hypothesis: The correlation matrix is an identity matrix (e.g., 1s on the diagonal and 0s everywhere else) • This test is affected by sample size… for large samples, almost always reject null hypothesis. That is why this is necessary but no sufficient evidence.

PCA vs FA • Principal Components vs Factor Analysis • There is no agreement in the field of practice as to which one is better or more appropriate • They are distinct techniques with different goals • PCA—data reduction method • factors are extracted distinctly to explain maximum variance

PCA vs. FA • FA—latent variable methodology. The “unobserved” factors predict the “observed” variables. • Aimed at explaining common variances shared by the observed variables • Difference: PCA extracts both common variance AND error variance – it tends to inflate the actual association between variables and factors.

PCA vs. FA • In reality, a PCA will yield similar results to a FA but PCA will have larger factor loadings (inflated estimated communalities) • General Rule: We want the first 2-3 factors to explain at least 50% of the total variance.

In Our World • We will use primarily Principal Axis Factoring (FA). • Once the Extraction Method is decided upon, we rotate the matrix. • Orthogonal—Keeps factors independent, perfect for regression • Oblique—allows some dependence between factors, ok if it makes conceptual sense.

The Necessary Steps • Identify and gather data appropriate for factor analysis • Decide upon extraction approach and selection criteria • PCA vs. PAF • Eigenvalue =>1. • Scree Plot. • Rotate extracted factors after deciding upon rotational approach • Varimax • Oblimin • Before naming factors, cycle through steps 2 and 3 until you have achieved a reasonable statistical and conceptual solution

Simple Example (SES) X1 = Family Income X2 = Father’s Education How is the correlation between X1 and X2 best represented?

The SES Equations X1 = X2 =

Factor Extraction • Assumes factors will be uncorrelated • How many factors? • Less than the number of variables being analyzed • Specific theorized number • Amount of variance explained (Eigenvalue) • Scree plot • Different approaches to extracting factors • Principal components • Principal axis factoring

Rotating Extracted Factors • Unrotated factor matrix is only one of many possible ones; transformations can clarify meaning without changing the underlying relationship amongst the variables • Rotation is typically necessary to ease interpretation • Desire to approach “simple structure” • Orthogonal or oblique? Varimax or Oblimin?

Interpreting and Naming Rotated Factors • Appropriate after cycling through various solutions and identifying the one that makes both statistical and conceptual sense • Naming should capture the essence of the variables that are most closely associated with each factor • Should take the relative strength of loading into account in naming factors

Let’s Move on to a More Complex Example • Assume there is a latent structure in describing why people go to college. • Theoretically we can make an argument that there are intrinsic and extrinsic reasons students go to college. • Let’s say we decide to use PAF and we decide we want to try both rotations, orthogonal and oblique…

Reasons people go to college

SPSS Syntax PAF /SORT

Results

Results: Orthogonal

Results: Oblique

Discussion • Are the two rotations different in any way? • How do we decide which one to use? • Questions?

Next Week • Read Pedhazur Ch 4 p79-80 • Read Pedhazur Ch 5 p81-83 • Read Pedhazur Ch 22 p607-627 • Read Pedhazur Ch 23 p631-632

Education 795 Class Notes