490 likes | 849 Views
Multivariate Methods. Nels Johnson and Matt Williams Laboratory for Interdisciplinary Statistical Analysis. Outline. Principal Component Analysis Factor Analysis Multivariate T Tests MANOVA Multidimensional Scaling Correspondence Analysis. PCA – Motivating Examples.
E N D
Multivariate Methods Nels Johnson and Matt Williams Laboratory for Interdisciplinary Statistical Analysis
Outline • Principal Component Analysis • Factor Analysis • Multivariate T Tests • MANOVA • Multidimensional Scaling • Correspondence Analysis
PCA – Motivating Examples • You have measured a number of variables concerning the size of aphids. You’d like to reduce the number of variables used for classification. • You have a bunch of football statistics for teams and would like to organize related teams based on these statistics.
What is it? • Based on an eigenvalue decomposition of the covariance matrix S (or correlation matrix R) of the variables. • Goal: Maximizes the variance of linear combinations of the variables. • Obtained by transforming the variables so that the covariance of the new variables is diagonal. • These new variables are called the principal components (PCs) and their covariance matrix contains the eigenvalues along the diagonal. • This transformation can be thought of as a rotation of the axes. • Note: No variables are designated as dependent.
What do we get out of it? • We can form an index measure (i.e. a score) or a weighted average of variables based on a subset of the PCs. • This reduces the number of variables we have to work with. • With some subject matter area knowledge we might be able to interpret the meaning of some of the PCs based on correlations.
How to reduce the number of PCs? • Pick a proportion of variation you want to explain ahead of time, pick the number of PCs so that the sum of their eigenvalues (the proportion of variation explained by those PCs) is at least that amount. • Scree Plots • All PCs with eigenvalue >1 (Kaiser’s Rule) • Broken stick method
What are some issues? • The scale variables are measured on matters. • Standardize variables so they are all on same scale. • Variables with a high amount of variability (i.e. large variance) will naturally steer the decomposition. • Again, standardize the variables. • When separation occurs perpendicular to an axis (i.e. PC) it might not be picked up without looking at other axes. • Plot the pairwise scores for each PC. This may require looking at too many graphs to be feasible.
Factor Analysis – Some Motivating Examples • You have the ratings people give to their family members in areas such as Kindness, Intelligent, Happy, etc. Want to associate family members with some sort of overall construct of these words. • You have conducted a survey and want to group question based on a topic they address.
What is it? • We assume the variables Y can be summarized by some underlying, unobserved, and reduced set of variables called factors (you must pick how many factors). • Goal is to estimate the factors. • After the factors are estimated, the next goal is to orthogonally rotate the solution to get simpler factors. • For Principal Factor Solution (more later): • Model : Y-μ = loadings*factors + error • var(Y-μ) or corr(Y-μ) = V = loadings*loadingsT + Ψ • The diagonals of H = V – Ψ are called the communalities. They are R2-like numbers. • Ψ is called the specific variance.
How to Estimate the Factors? • Two main ways: • Principal Component Solution (Not PCA!) • Focuses on the diagonal of V (the variance). • Does poorly on the off diagonal (the covariance). • Principal Factor Solution • Focuses on the off diagonals of V and pretty much ignores the diagonal. • Maximum Likelihood Method • Assume normality of error and estimate the factors and loadings using an iterative MLE method. • May give nonsensical answers (i.e. Haywood case). • Can adjust iterative method so this doesn’t happen. • Rotations are unique.
More On Rotations • If the rotation is orthogonal then • loadings*loadingsT = loadings*rotation*rotationT*loadingsT = (loadings*rotation)*(loadings*rotation)T • So we can redistribute the total variance and variation explain by each variable differently among the factors without actually changing them. • Lots of methods to pick rotations.
Interpreting Analysis • Loadings represent the covariance (or correlation) between factors and variables. • So we look for high loadings to represent how underlying factors influence variables. • With some subject matter knowledge we can name factors based off of these loadings (when they make sense).
Some Issues • Results can change depending on model choices (This is a big deal)! • Number of factors • Estimation method • Rotation method • Haywood cases when using MLE. • Existence of actual factors is suspect.
Multivariate T Tests • Univariatet-test • Normal data, with unknown mean and variance • Hotelling’s T2 Test • Multivariate Normal data with unknown mean and Covariance
One Sample Test • Assumptions • Observations are independent and multivariate normal • Testing • Null Hypothesis: μ = μ0 (vectors) • Alternative: μ ≠ μ0 (vectors)
Example: One Sample Test • We are interested in 3 different types of calcium in the soil • We wish to test if our observed means are the true means (15,6,2.85)
Two Sample Test • Assumptions • Two groups of multivariate normal data • Observations are independent • Means may be different but covariance is the same for both groups • Testing • Null Hypothesis: μ1 = μ2 (vectors) • Alternative: μ1 ≠ μ2 (vectors)
Example: Two Sample Test • Four psychological tests were given to 32 men and 32 women • We are interested in seeing if the mean vectors are the same
Other Tests • Two sample paired test • Use difference vector D = X1 – X2 • Partial Tests • Testing μi = μi0 in the presence of the other (p-1) means • What about more than 2 groups? • We had ANOVA instead of a t-test • Now we have MANOVA instead of a T2
Multivariate Analysis of Variance MANOVA • Suppose we have data organized into several groups, with each observation giving a vector of responses • We would like to test the hypothesis that all the means for each of the groups are equal • We can do this in a manner very similar to the univariate Analysis of Variance (ANOVA)
MANOVA • In ANOVA • We compare Sums of Squares within groups to Sums of Squares between groups • Sums of Squares are the sums of the squared differences between the observed values and the means • In MANOVA • We compare Sums of Squares matrices from within the groups to those between the groups • E is the “within” Sums of Squares matrix • H is the “between” Sums of Squares matrix
Four Tests • There are four tests based on the eigenvalues of E-1H: λ1 > λ2 > … > λs with s ≤ pd • Pillai: • Lawley-Hotelling: • Wilk’s Lambda: • (reject for small values) • Roy’s Largest Root:
Comparison of the Four Tests • In the collinear case • The groups have means that lie on a line in space (approximately) • θ ≥ U(s) ≥ Λ ≥ V(s) in terms of power • In the diffuse case • The groups means are spread out in a higher dimensional space (not a line) • θ ≤ U(s) ≤ Λ ≤ V(s) in terms of power
Post-Test Analysis • Just like with ANOVA, after the test we can • Do pair-wise comparisons or contrasts • In MANOVA we can also • Do tests for the p individual variables • F tests to identify which variables are different
Example: Rootstock Data • We wish to compare apple trees of different rootstocks • We have 8 trees from each of 6 rootstocks • Our four measurements are • Trunk girth at 4 years (y1) • Extension growth at 4 years (y2) • Trunk girth at 15 years (y3) • Extension growth at 15 years (y4)
Rootstock Data • Test Results • Λ = .154 < Λ.05,4,5,40 = .455 • V(s)= 1.305 > V(s).05 = .645 • U(s) = 2.921 > U(s).05 • θ = .652 > θ.05 = .377 • Follow-up tests for individual variables • Y1 : F = 1.93, p = .1094 • Y2 : F = 2.91, p = .024 • Y3 : F = 11.97, p < .0001 • Y4 : F = 12.16, p < .0001
Extensions • Two-way MANOVA • Multivariate Contrasts • Mixed Models • Split plot designs • Profile Analysis • Different R2-like numbers
Multidimensional Scaling (MDS) • Data is a distance or similarity matrix • Many ways to generate • Goal is to reduce dimension and visualize • Often look at only 2 or 3 dimensions • Motivating Examples • Number of teeth for different species of mammals • Discriminating between colors (red vs. orange) • Distances between cities
Two Kinds of MDS • Metric scaling (principal coordinates analysis) • Distances (Euclidean) in the reduced dimension are close to those measured in the full dimension • Non-metric scaling • Rank order of distances in the reduced dimension are close to those measured in the full dimension
Types of Measures • There are MANY measures that can be used • Depends on type of data • Depends on interest in observations vs. variables • Properties • Minimum of 0, D(x,y) = 0 if x = y • Positive otherwise, D(x,y) > 0 • Symmetric, D(x,y) = D(y,x) • Triangle Inequality, D(x,y) + D(y,z) > D(x,z)
Types ofMeasures • Measures that satisfy 1-4 are called Metrics • Measures satisfying 1-3 are Semi-metrics • Some measures have negative values and are called Non-metrics • Certain measures can be plotted or visualized in a Euclidean space • Distances and relationships plotted are meaningful • This is a stronger property than the triangle inequality
Measures for our Examples • Mammal teeth - counts of teeth types • Manhattan (city block) distance • Total teeth different between two species • Difference between colors (Ekman) • Similarity measure – converted to distance • How well people distinguish between colors • We use the Kruskal measure (non-metric) • Distances between cities • Euclidean distance • Miles between cities
Basic Procedure for MDS • Metric Scaling • Eigenvalue/eigenvector decomposition • Choose a reduced number of components that still preserves distances • Create new coordinates based on reduced components • Non-metric scaling • Reduce dimensions but preserve rank order • Done using Isotonic regression and iterative algorithms
Examples: Teeth Data • 32 mammals and 8 categories of teeth • We are interested in how “close” these mammals are based on their teeth counts • We use city block distance and look at want to reduce things to 2 dimensions (from 8)
Example: Ekman Color Study • 14 different wavelengths • 31 subjects asked to rate how well they could distinguish between different pairs • Ratings were averaged and scaled to get a similarity index between 0 and 1 • We use non-metric scaling and look at a reduction to 2 dimensions (from 14)
Example: Distances between cities • We have 10 U.S. cities and distances between all pairs • Can we reduce this distance matrix to a lower dimension like 2 (from 10).
Comments on MDS • There are MANY measures we can use • Some make more sense than others • It depends on the data and what you are interested in • Different measures can lead to different results • How many dimensions should you use? • It’s easiest to explain 2-3 three dimensions • There are different criteria or guidelines for metric and non-metric scaling
One More Example • Supposed we have data that can be organized into a two-way table or binary or count values. • For a small table we can do some contingency table analyses like tests for homogeneity or independence. • For large tables we might like to reduce or summarize the table • One method is called Correspondence Analysis
Correspondence Analysis • Our distance measure is the Pearson chi-square measure between the observed cell value and its expected value. • As before, we need to decide if we are interested in our subjects or our variables • Similar or analogous to PCA and MDS in terms of dimension reduction and interpretation. • Unfortunately, the terminology is a little different. So be careful.
Example: Postal Employees • Postal employees for 6 positions were drug tested • Results include negative, marijuana, cocaine, and other • We are interested in identifying any patterns or trends
Sources • We compiled the information from this talk from Methods of Multivariate Analysis 2nd ed. by Alvin C. Rencherand from our notes from STAT 5504 compiled by Dr. Eric Smith, Dept. of Statistics. • Thanks! Any questions?