600 likes | 765 Views
Factor Analysis 2006. Lecturer: Timothy Bates Tim.bates@ed.ac.uk Lecture Notes based on Austin 2005 Bring your hand out to the tutorial Please read prior to the tutorial. FACTOR ANALYSIS.
E N D
Factor Analysis 2006 Lecturer: Timothy Bates Tim.bates@ed.ac.uk Lecture Notes based on Austin 2005 Bring your hand out to the tutorial Please read prior to the tutorial
FACTOR ANALYSIS • A statistical tool to account for variability in observed traits in terms of a smaller number of factors • Factor = "unobserved random variable" • Measured item = Observed random variable • Values for an observation are recovered (with some error) from a linear combination of (usually much smaller set of) extracted factors.
FA as a Data reduction technique • Simplify complex multivariate datasets by finding “natural groupings” within the data • May correspond to underlying ‘dimensions’. • Subsets of variables that correlate strongly with each other and weakly with other variables in the dataset. • Natural groupings (factors) can assist the theoretical interpretation of complex datasets • Theoretical linkage of factors to underlying (latent) constructs, e.g. “extraversion”, liberal attitudes, interest in ideas, ability
EXAMPLE DATASET 210 students produced self-ratings on a list of trait adjectives. Correlations above 0.2 marked in bold • 1. ASSERTIVE, 2. TALKATIVE, 3.EXTRAVERTED, 4. BOLD • 5. ORGANIZED • 6. EFFICIENT, 7. THOROUGH, 8. SYSTEMATIC • 9. INSECURE • 10. SELF-PITYING, 11 NERVOUS, 12. IRRITABLE • Clear structure in this sorted matrix • How easy would this be to see in a larger matrix?
THE THREE FACTORS FROM THE EXAMPLE DATA • The numbers are factor loadings = correlation of each variable with the underlying factor. • Loadings less than 0.1 omitted.) • Can construct factor score (multiplied factor loadings) • N =(0.75*Nervous) + (.73*Irritable) + (.73*Insecure) + (.72*Self-pity) – (.10*Extraverted) –(.21*Assertive) • Main loadings are large and highly significant. • Smaller (cross-)loadings may be informative. • Factors are close to simple structure.
OBJECTIVES AND OUTCOMES OF FACTOR ANALYSIS • Aim of factor analysis is to objectively detect natural groupings of variables (factors) • Can deal with large matrices, uses (reasonably) objective statistical criteria. • Can obtain quantitative information • e.g. factor scores. • Factors are (should be) of theoretical interest. • In the example the factors correspond to the personality traits of Extraversion, Neuroticism and Conscientiousness • Exploratory method, uncovering structure in data • Confirmatory factor analysis (model testing) is also possible.
SOME TECHNICAL REQUIREMENTS FOR A FACTOR ANALYSIS TO BE VALID AND USEFUL • Simple structure • Each item loads highly on one factor and close to zero on all others • Factors have a meaningful theoretical interpretation • Rotation • Factors retain most of the variance in the raw data • Parsimony compared to starting variables achieved without loss of explanatory power • Factors are Replicable
Assumptions • Large enough sample • So that the correlations are reliable • Somewhat normal variables, No outliers • No variables uncorrelated with any other • No variables correlated 1.0 with each other • Remove one of each problematic pair, or use sum if appropriate.
DATA QUALITY • Sample Size • Rough rule is that 300 is OK, smaller numbers may be OK. • Subjects/variables ratio • Much discussion (less agreement) • Values between 2:1 and 10:1 have been proposed as a minimum. • Simulations suggest that overall sample size is more important. • Well-defined factors (large loadings) will replicate in smaller samples than poorly-defined ones (small loadings)
STAGES OF ANALYSIS • Examine data for outliers and correlations • Choose number of factors • Scree plot • Rotate factors if necessary • Interpret factors • Obtain scores • Check reliability of scales defining factors • Further experiments to validate factors
Partitioning item variance • Variance of each item can be thought of in three partitions: 1. Shared variance • Common variance, explained by factors + Unique variance Not explained by other factors • 2. Specific variance • 3. Error variance • Communality • The proportion of common variance for a given variable • Sum of squares of item factor loadings • Large communalities are required for a valid and useful factor solution
Computing a Factor Analysis • Two main approaches • Differ in estimating communalities • Principal components • Simplest computationally • Assumes all variance is common variance (implausible) but gives similar results to more sophisticated methods. • SPSS default. • Principal factor analysis • Estimates communalities first
How many Factors? • Initially unknown • Needs to be specified by the investigator on the basis of preliminary analysis • No 100% foolproof statistical test for number of factors • Similar problems with other multivariate methods
How many factors? • There are potentially as many factors as items • We don’t want to retain factors which account for little variance. • Most commonly-used method to decide the number of factors is the “scree” plot of the “eigenvalues” • Variance explained by each factor. • A point of inflection or kink or in the scree plot is a good method of making a cut-off
OTHER METHODS FOR FACTOR NUMBERS • Eigenvalues > 1 • Eigenvalues sum to the number of items, so an eigenvalue of >1 = more informative than a single average item • Not a useful guide in practice • Parallel Analysis • Repeatedly randomise the correlation matrix and determine how large an eigenvalue appears by chance in many thousands of trials. • Excellent method • Theory-driven • Extract a number of factors based on theoretical considerations • Hard to justify
How to align the factors? • The initial solution is “un-rotated” • Two undesirable features make it hard to interpret: • Designed to maximise the loadings of all items on the first factor • Most items have large loadings on more than one factor • Hides groupings in the data
ROTATION – DETAIL (1) • Rotation shows up the groups of items in the data. • Orthogonal rotation • Factors remain independent • Oblique rotation • Factors allowed to correlate • Theoretical reasons to choose a type of rotation • (e.g. for intelligence test scores); • May explore both types • Choose oblique if there are large correlations between factors, orthogonal otherwise.
Item loadings on the first 2 factors +1 N X X X X -1 +1 X X X X C -1
Lack of Simple Structure +1 N X X X X X -1 +1 C X X -1
Rotation Defines New Axes WhichReveal the Item Groups X X X X X X
Oblique Rotation X X X X X X
ROTATION -DETAIL (2) • Rotated and un-rotated solutions are mathematically equivalent • Rotation is performed for purposes of interpretation. • Most common types: • Oblique • Direct oblimin • Orthogonal • Varimax (maximzes squared colun variance) • Most common • Quartimax (maximises row variance) • Equamax simplifies rows and columns of a factor matrix
INTERPRETING FACTORS • Done on the basis of ‘large’ loadings • Often taken to be above 0.3. • Size of loading which should be considered substantive is sample-size dependent. • For large samples loadings of 0.1 or below may be significant but do not explain much variance. • Well-defined factor should have at least three high-loading variables • Existence of factors with only one or two large loadings indicates factors over-extracted, or multi-colinearity problems. • Assigning meaning to factors.
FACTOR SCORES • Factor scores • Estimate of each subject’s score on the underlying latent variable • Calculated from the factor loadings of each item. • Simple scoring methods • Often used for, e.g., personality questionnaires is to sum the individual item scores (reverse-keying where necessary). • This method is reasonable when all variables are measured on the same scale; • What if you have a mix of items measured on different scales? • (e.g. farmer’s extraversion score, farm annual profit, farm area).
EXAMPLE 1 – FACTOR STRUCTURE OF DIETARY BEHAVIOUR • Research question: Is there a dimension of healthy vs. unhealthy diet preferences? • (Mac Nicol et al 2003) • 451 schoolchildren completed a 35-item questionnaire mainly on food items regularly consumed (also some general health behaviour items) • Subjects:variables 12.9. Population not representative for SES. • Scree suggested three factors, two diet related • F1: Unhealthy foods (chips, fizzy drinks etc) • F2 Healthy foods (fruit, veg etc) • Validation • Higher SES and better nutrition knowledge associated with healthier eating patterns. • Factor reliabilities low • Problem of yes/no items • Sample in-homogeneity.
EXAMPLE 2 –FACTOR STRUCTURE OF THE AQ(Austin, 2005) • Does the AQ have the factor structure that its original author thinks it has? • The AQ is a 50-item questionnaire designed to assess autistic traits in the general population and at the high-functioning end of the clinical range. • Designed produce a general factor and to have subscales assessing well-known clinical characteristics of autism: • Poor social skills • Strong focus of attention • Attention to detail • Poor communication • Poor imagination/play • Completed by 201 undergraduates. • Subjects: variables 4:1. • Scree suggested a general factor + three sub-factors • Poor social skills, attention to detail and poor communication. • Reliabilities OK, some validation (males vs. females, arts vs. science)
EXAMPLE 3 –FACTOR STRUCTURE OF AN EI SCALE • How many factors in a published emotional intelligence scale, and can it be improved by adding more items? • (Saklofske, etal. 2003; Austin et al., 2004). • 354 undergraduates completed a 33-item EI scale for which previous findings on the factor structure had given contradictory results. • Scree plot (and some confirmatory modelling) suggested four factors, one with poor reliability. • The factor structure has been replicated although other factor structures have been reported. • A longer 41-item version of the same scale was constructed with more reverse-keyed items than the original scale, and also with additional items targeted on the low-reliability factor (utilisation of emotions). • Completed by 500 students and was found to have a three-factor structure. • Reliability of utilisation subscale increased, but still below 0.7.
EXAMPLE 4 – ABNORMAL PERSONALITY • How does personality disorder relate to normal personality? • Deary et al. (1998). • Scale-level analysis of DSM-III-R personality disorders & EPQ-R • Sample = 400 students • Joint analysis gives four factors: • N+ Borderline, Self-defeating, Paranoid • P+ Antisocial, Passive-aggressive, Narcissistic • E+ avoidant(-), histrionic • P(-) Obsessive-compulsive, Narcissistic
EXAMPLE 5 - THE ATTITUDES TO CHOCOLATE QUESTIONNAIRE • 80 items on attitudes to chocolate were constructed using interviews and related literature. • Aspects assessed included • difficulty controlling consumption, positive attitudes, negative attitudes, craving. • Self-report chocolate consumption was obtained; participants also performed a bar-pressing task with chocolate button reinforcements delivered on a progressive ratio schedule. • Factor analysis gave three factors (eigenvalue 1 criterion) • 33.2%, 14.1% & 6.1% of the variance. • Third scale had low reliability • Probably over-factored. • Follow up paper (Cramer & Hartleib, 2001) has confirmed the first two factors.
Factors Found • Craving • I like to indulge in chocolate • I often go into a shop for something else and end up buying chocolate), • Guilt • I feel guilty after eating chocolate • Functional approach • I eat chocolate to keep my energy levels up when doing physical exercise. • High-craving individuals reported • Consuming more bars per month • Were prepared to work harder to get chocolate buttons
Example 6: Criterion based FA(Kline, Easy Guide, Ch 9) • Two groups: long-term tranquilliser users and matched controls • Measured • Personality • Psychological distress • Life events • Health data • Visits to GP • Ratings by GP • etc. etc. • What factor(s) predict group membership? • High loadings for the group membership variable • In this study the best factor loaded • Anxiety • Few friends • High GP contact • High repeat prescriptions • Some variables unrelated (life events, job satisfaction, church attendance…) • Alternative approaches • Regression • Cluster analysis
End of Lecture I • See you next week :-)
INTERPRETING FACTORS • Done on the basis of ‘large’ loadings • Often taken to be above 0.3. • Size of loading which should be considered substantive is sample-size dependent. • For large samples loadings of 0.1 or below may be significant but do not explain much variance. • Well-defined factor should have at least three high-loading variables • Existence of factors with only one or two large loadings indicates factors over-extracted, or multi-colinearity problems. • Assigning meaning to factors.
FACTOR SCORES • Factor scores • Estimate of each subject’s score on the underlying latent variable • Calculated from the factor loadings of each item. • Simple scoring methods • Often used for, e.g., personality questionnaires is to sum the individual item scores (reverse-keying where necessary). • This method is reasonable when all variables are measured on the same scale; • What if you have a mix of items measured on different scales? • (e.g. farmer’s extraversion score, farm annual profit, farm area).
STATISTICAL TESTS FOR DATA QUALITY • Examine KMO statistic. • Kaiser-Meyer-Olkin test of sampling adequacy • Should be 0.5 or more. • Low values indicate diffuse correlations with no substantive groupings. • KMO statistics for each item • Item values below 0.5 indicate item does not belong to a group and may be removed • Bartlett’s test of sphericity. • Tests that the correlation matrix is significantly different from an identity matrix. • p-value should be significant • Tests that there are not duplicate items in the matrix
SPSS ASPECTS • Path to follow is analyse, data reduction, factor. • EXTRACTION • Select scree plot for initial run. • Choose number of factors. • ROTATION • Select rotation method • Increase number of iterations for rotation if necessary (default 25) • DESCRIPTIVES • KMO and Bartlett tests • Reproduced correlations and residuals • Anti-Image matrix • SCORES • Save as variables • Method
Sort coefficients by size Suppress small loadings OPTIONS
SCORING ETC. • Factor scores constructed as above or by related methods can be used in further analyses • e.g. are there M/F differences in scores on N, E, C? • Do the factor scores correlate with other measures (exam anxiety, subjective reports of life quality, number of friends, exam success…)
OTHER ASPECTS OF FACTOR ANALYSIS • Discussion so far has been in terms of questionnaire items, but factor analysis is possible with any set of measures for which correlations can be calculated. • Hypothetical example: personality traits, socio-economic status, salary, life satisfaction, number of serious illnesses etc in the last five years • Datasets of this type raise issues of factor analysis vs. regression modelling. • Scale-level analysis can be very useful in the study of personality/individual differences. • Hierarchical factor structures. • Best-known example is intelligence test scores. • Scores on a diverse range of tests are usually all positively intercorrelated (positive manifold). • Can extract either • A general ability (g) factor (positive loadings from all tests) • or • Examine clustering of tests in more detail giving correlated (oblique) lower-level factors. • Choice of level of description; both descriptions are equally ‘correct’.
Nested Analysis g gs d gf gr gc Specific tests
USING FACTORS • Naming – use content of high-loading items as a guide • Assess internal reliability for each factor • Scores – ‘unit weighting’ best for comparison between samples • Validation – do factor scores correlate as expected with other variables? Issues of convergent/divergent validity with other tests if relevant.
Scale Reliability • Factor Derived Scales can be assessed as with any other scale • For instance using Cronbach’s Alpha • Check alpha if item deleted to identify poorly-functioning items • Adequate reliability is defined as 0.7 or above
CONFIRMATORY FACTOR ANALYSIS • Hypothesis testing • Test the “fit” of a pre-specified model • Compare different Models • Available in several packages • AMOS, Mx, Mplus • Not covered in this course
How to assess FA • Sample size • To things matter: • ratio of subjects to Items • Total sample size • Item to subject ratio is important • Can get away with smaller numbers when communalities are high (i.e. factors well-defined) • Restriction of range (subject too similar) • reduces correlations • Items per factor. • Need at least three per factor, four is better. Some published analyses discuss factors with only one item loading! • Use of eigenvalue-1. • Often seen in papers where factor number comes out implausibly high. • Rotation. • Orthogonal used when oblique should have been tried first. • Generally safest to assume by default that factors will correlate. • Scores. • SPSS and other packages give scores which are sample-dependent. • Use of unit weighting of items is better practice.