1 / 15

Determining the # Of PCs

Determining the # Of PCs. Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial factors” approaches Help that’s coming later . How the process really works…. Here’s the series of steps we talked about earlier. # factors decision

joshua
Download Presentation

Determining the # Of PCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining the # Of PCs Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial factors” approaches Help that’s coming later

  2. How the process really works… Here’s the series of steps we talked about earlier. • # factors decision • Rotate the factors • interpreting the factors • factor scores These “steps” aren’t made independently and done in this order! Considering the interpretations of the factors can aid the # factors decision! Considering how the factor scores (representing the factors) relate to each other and to variables external to the factoring can aid both the # factors decision and interpretation.

  3. Some cautionary comments Remember that the “# factors” decision ... • is influenced by the particular variables in the analysis • so, unless you are working with a “closed set” of variables, there probably isn’t a “real # factors” • the “whole story” includes how the # factors changes with what variable additions and deletions • how do these change your interpretation of factors and variables • isn’t independent of interpretability • a factor is only “real” if its meaningful • be cautious of both “making up” and “missing” meaning

  4. Some cautionary comments, cont. • agreement across decision rules is helpful • we’ll talk about several decision rules, each of which is flawed in known ways • replication is convincing • split-half and hold-out sampling can help • separate-sample replication is more convincing • convergence research is more convincing • not just replicating, but correctly anticipating what will be the results of adding, deleting variables across samplings • Remember this is “Exploratory” factoring … • explore & consider alternative # factor solutions • Want to be really convincing? Use confirmatory factoring!!

  5. Statistical Procedures • PC analyses are extracted from a correlation matrix • PCs should only be extracted if there is “systematic covariation” in the correlation matrix • This is know as the “sphericity question” • Note: the test asks if there the next PC should be extracted • There are two different sphericity tests • Whether there is any systematic covariation in the original R • Whether there is any systematic covariation left in the partial R, after a given number of factors has been extracted • Both tests are called “Bartlett’s Sphericity Test”

  6. Statistical Procedures, cont. • Applying Bartlett’s Sphericity Tests • Retaining H0: means “don’t extract another factor” • Rejecting H0: means “extract the next factor” • Significance tests provide a p-value, and so a known probability that the next factor is “1 too many” (a type I error) • Like all significance tests, these are influenced by “N” • larger N = more power = more likely to reject H0: = more likely to “keep the next factor” (& make a Type I error) • Quandary?!? • Samples large enough to have a stable R are likely to have “excessive power” and lead to “over factoring” • Be sure to consider % variance, replication & interpretability

  7. Mathematical Procedures • The most commonly applied decision rule (and the default in most stats packages -- chicken & egg ?) is the  > 1.00 rule … here’s the logic Part 1 • Imagine a spherical R (of k variables) • each variable is independent and carries unique information • so, each variable has 1/kth of the information in R • For a “normal” R (of k variables) • each variable, on average, has 1/kth of the information in R

  8. Mathematical Procedure, cont. Part 2 • The “trace” of a matrix is the sum of its diagonal • So, the trace of R (with 1’s in the diag) = k (# vars) •  tells the amount of variance in R accounted for by each extracted PC • for a full PC solution   = k (accounts for all variance) Part 3 • PC is about data reduction and parsimony • “trading” fewer more-complex things (PCs - linear combinations of variables) for fewer more-simple things (original variables)

  9. Mathematical Procedure, cont. Putting it all together (hold on tight !) • Any PC with  > 1.00 accounts for more variance than the average variable in that R • That PC “has parsimony” -- the more complex composite has more information than the average variable • Any PC with  < 1.00 accounts for less variance than the average variable in that R • That PC “doesn’t have parsimony” -- the more complex composite has more no information than the average variable

  10. Mathematical Procedure, cont. There have been examinations the accuracy of this criterion • The usual procedure is to generate a set of variables from a known number of factors (vk = b1k*PC1 + … +bfk*PCf, etc.) --- while varying N, # factors, # PCs & communalities • Then factor those variables and see if  > 1.00leads to the correct number of factors Results -- the rule “works pretty well on the average”, which really means that it gets the # factors right some times, underestimates sometimes and overestimates sometimes • No one has generated an accurate rule for assessing when which of these occurs • But the rule is most accurate with k < 40, f between k/5 and k/3 and N > 300

  11. Nontrivial Factors Procedures These “common sense” approaches became increasing common as… • the limitations of statistical and mathematical procedures became better known • the distinction between exploratory and confirmatory factoring developed and the crucial role of “successful exploring” became better known These procedures are more like “judgement calls” and require greater application of content knowledge and “persuasion”, but are often the basis of good factorings !!

  12. Nontrivial factors Procedures, cont. Scree-- the “junk” that piles up at the foot of an glacier a “diminishing returns” approach • plot the  for each factor and look for the “elbow” • “Old rule” -- # factors = elbow (1966; 3 below) • “New rule” -- # factors = elbow - 1 (1967; 2 below) • Sometimes there isn’t a clear elbow -- try another “rule” • This approach seems to work best when combined with attention to interpretability !!  4 2 0 # PC 1 2 3 4 5 6

  13. An Example… A buddy in graduate school wanted to build a measure of “contemporary morality”. He started with the “10 Commandments” and the “7 Deadly Sins” and created a 56-item scale with 8 subscales. His scree plot looked like… How many factors? λ 1? – big elbow at 2, so ’67 rule suggests a single factor, which clearly accounts for the biggest portion of variance 7? – smaller elbow at 8, so ’67 rule suggests 7 8? – smaller elbow at 8,’66 rule gives the 8 he was looking for – also 8th has λ > 1.0 and 9th had λ < 1.0 0 1 10 20 1 8 20 40 56 • Remember that these are subscales of a central construct, so.. • items will have substantial correlations both within and between subscales • to maximize the variance accounted for, the first factor is likely to pull in all these inter-correlated variables, leading to a large λ for the first (general) factor and much smaller λs for subsequent factors • This is a common scree configuration when factoring items from a multi-subscale scale!

  14. Nontrivial factors Procedures, cont. % of variance accounted for • keep the factors necessary to account for “enough” variance -- 75% to 90% are common goals Interpretability -- meaningfulness of resulting PCs • Depends greatly upon content knowledge • Beware “factoring illusions” • We’re good at “finding patterns”, even when they’re not really there Rotational Survival -- akin to meaningfulness • Consider different # factors with different types of rotation -- see which factors “keep showing up” Replicability -- split, holdout, or independent samples • What PCs appear consistently across factorings? Jack-knifing • Re-sampling from a single dataset – looking for consistency of # factors

  15. Help that’s coming later If you have a reasonably “clear” factor structure all the different ways of deciding the # factors are likely to give the same result (except maybe statistical – likely to over-factor with ^N) Remember that “what the factors are” can be very important in deciding “how many factors there are” • Consider the different “interpretations” of the factors from the different #-of-factors solutions • we can also look at the correlations between the factors to help with these decisions Remember that “what the factors do” can be very important in deciding “how many factors there are” • you can look at how factors from the different #-of-factor solutions correlate with other variables that are not in the factor analysis

More Related