220 likes | 355 Views
Cluster Analysis. Introduction. Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized No (in)dependent variables Find naturally occurring groupings of objects.
E N D
Cluster Analysis Dr. Michael R. Hyman
Introduction • Also called classification analysis and numerical taxonomy • Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized • No (in)dependent variables • Find naturally occurring groupings of objects
Uses in Studying Consumers • Benefit segmentation • Finding market niches • Finding homogeneous market segments for future study • Data reduction
Scatter Plot of Income and Education Data for PC Owners and Non-owners
Procedure #1: Divisive (tear down) • Start with profile data • Find variable with highest variance • Split objects above and below mean on this variable • Find remaining high variance variable and split along mean
Procedure #2: Agglomerative (build up) • Select similarity measure • Distance (Euclidean, city block) • Correlation • Similarity • Search similarity matrix for most similar cluster pair • Repeat iteratively until only one cluster remains
Procedure #2: Agglomerative Stopping Rules • Theory and practice • Distance that clusters combine • Within/between group variance • Relative sizes of clusters
Procedure #2: Agglomerative Linkage Methods • Single (nearest neighbor) • Makes long, thin clusters • Complete (maximum distance to farthest neighbor) • Sensitive to outliers • Average distance between objects • Variance methods (minimum within-cluster variance) • Nodal (begin with two least similar objects as nodes)
Procedure #2: Agglomerative Reliability and Validity Assessment • Use different distance measures • Use different clustering methods • Split data, run both halves, and compare • Shuffle cases (objects) • Solve with subset of profile variables
General Problems • Early assignments treated as permanent • Precludes later revision for improved fit • Number of clusters • More clusters means greater intra-group homogeneity but less descriptive power • No good measure of cluster compactness • Lack of statistical properties makes inference difficult
General Problems (cont.) • Coping with inter-correlated profile variables • Must select profile variables that can discriminate among objects • Sensitive to unit of measurement and outliers • Fix: Standardize data and delete outliers • Subjective interpretation of results (i.e., naming clusters)