390 likes | 403 Views
This text discusses self-organizing maps (SOM) as a constrained version of k-means clustering, the evaluation of clustering results, and dimension reduction using principal component analysis (PCA) and factor analysis. It also covers techniques such as Monte Carlo methods and cophenetic correlation coefficient index for evaluating clustering results.
E N D
Clustering (3) Self-organizing maps Evaluation of clustering results Figures and equations from Data Clustering by Gan et al.
Self-organizing maps a constrained version of K-means clustering the prototypes are encouraged to lie in a one- or two-dimensional manifold in the feature space – “a constrained topological map”
Self-organizing maps Set up a two-dimensional rectangular grid of K prototypes mj∈ Rp(usually on the two-dimensional principal component plane) Loop for observation data points xi - find the closest prototype mj to xiin Euclidean distance - for all neighbors mk of mj (within distance r in the 2D grid), move mk toward xi via the update Once the model is fit, the observations are mapped down onto the two-dimensional grid.
Self-organizing maps 5 × 5 grid of prototypes
Self-organizing maps SOM moves the prototypes closer to the data, but also to maintain a smooth two-dimensional spatial relationship between the prototypes- a constrained version of K-means clustering If r is small enough, SOM becomes K means, training on one data point at a time. Both r and α decrease over iterations.
Self-organizing maps Is the constraint reasonable?
Evaluation External criteria approach: Comparing clustering results (C ) with a pre-specified partition (P). For all pairs of samples, M=a+b+c+d
Evaluation Monte Carlo methods based on H0 (random generation), or bootstrap are needed to find significance.
Evaluation External criteria: An alternative is to compare the proximity matrix Q with the given partition P. Define matrix Y based on P:
Evaluation Internal criteria: evaluate clustering structure by features of the dataset (mostly proximity matrix of the data). Example: For Hierarchical clustering, Pc: cophenetic matrix, the ijth element represents proximity level at which two data points xi and xjare first joined into the same cluster. P: proximity matrix.
Evaluation Cophenetic correlation coefficient index: CPCC is in [-1,1]. Higher value indicates better agreement.
Evaluation Relative criteria: choose the best result out of a set according to predefined criterion. Example: Modified Hubert’s Γ statistic: P is the proximity matrix of the data. High value indicates compact clusters.
Dimension reduction (1) Overview PCA Factor Analysis
Overview • The purpose of dimension reduction: • Data simplification • Data visualization • Reduce noise (if we can assume only the dominating dimensions are signals) • Variable selection for prediction
PCA • Explain the variance-covariance structure among a set of random variables by a few linear combinations of the variables; • Does not require normality!
Reminder of some results for random vectors Proof of the first (and second) point of the previous slide.
PCA The eigen values are the variance components: Proportion of total variance explained by the kth PC:
PCA The geometrical interpretation of PCA:
PCA PCA using the correlation matrix, instead of the covariance matrix? This is equivalent to first standardizing all X vectors.
PCA Using the correlation matrix avoids the domination from one X variable due to scaling (unit changes), for example using inch instead of foot. Example:
PCA Selecting the number of components? Based on eigen values (% variation explained). Assumption: the small amount of variation explained by low-rank PCs is noise.
Sparse PCA In high-dimensional data, loadings of a single PC on 10,000 genes doesn’t make much sense. To obtain sparse loadings, and make the interpretation easier, and the model more robust. SCoTLASS
Sparse PCA Zhou, Hastie, Tibshirani’s SPCA by regression (regression/reconstruction property of PCA):
Factor Analysis If we take the first several PCs that explain most of the variation in the data, we have one form of factor model. L: loading matrix F: unobserved random vector (latent variables). ε: unobserved random vector (noise)
Factor Analysis Orthogonal factor model assumes no correlation between the factor RVs. is a diagonal matrix
Factor Analysis Rotations in the m-dimensional subspace defined by the factors make the solution non-unique: PCA is one unique solution, as the vectors are sequentially selected. Maximum likelihood estimator is another solution:
Factor Analysis As we said, rotations within the m-dimensional subspace doesn’t change the overall amount of variation explained. Do rotation to make the results more interpretable:
Factor Analysis Varimax criterion: Find T such that is maximized. V is proportional to the summation of the variance of the squared loadings. Maximizing V makes the squared loadings as spread out as possible --- some are real small, and some are real big.
Factor Analysis Orthogonal simple factor rotation: Rotate the orthogonal factors around the origin until the system is maximally aligned with the separate clusters of variables. Oblique Simple Structure Rotation: Allow the factors to become correlated. Each factor is rotated individually to fit a cluster.