Cluster Analysis

Cluster Analysis Dr. Michael R. Hyman

Introduction • Also called classification analysis and numerical taxonomy • Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized • No (in)dependent variables • Find naturally occurring groupings of objects

Uses in Studying Consumers • Benefit segmentation • Finding market niches • Finding homogeneous market segments for future study • Data reduction

Clusters Formed by Using Data on Two Characteristics

Scatter Plot of Income and Education Data for PC Owners and Non-owners

Procedure #1: Divisive (tear down) • Start with profile data • Find variable with highest variance • Split objects above and below mean on this variable • Find remaining high variance variable and split along mean

Procedure #2: Agglomerative (build up) • Select similarity measure • Distance (Euclidean, city block) • Correlation • Similarity • Search similarity matrix for most similar cluster pair • Repeat iteratively until only one cluster remains

Commonly Used Similarity Coefficients 20

Procedure #2: Agglomerative Stopping Rules • Theory and practice • Distance that clusters combine • Within/between group variance • Relative sizes of clusters

Procedure #2: Agglomerative Linkage Methods • Single (nearest neighbor) • Makes long, thin clusters • Complete (maximum distance to farthest neighbor) • Sensitive to outliers • Average distance between objects • Variance methods (minimum within-cluster variance) • Nodal (begin with two least similar objects as nodes)

Procedure #2: Agglomerative Reliability and Validity Assessment • Use different distance measures • Use different clustering methods • Split data, run both halves, and compare • Shuffle cases (objects) • Solve with subset of profile variables

General Problems • Early assignments treated as permanent • Precludes later revision for improved fit • Number of clusters • More clusters means greater intra-group homogeneity but less descriptive power • No good measure of cluster compactness • Lack of statistical properties makes inference difficult

General Problems (cont.) • Coping with inter-correlated profile variables • Must select profile variables that can discriminate among objects • Sensitive to unit of measurement and outliers • Fix: Standardize data and delete outliers • Subjective interpretation of results (i.e., naming clusters)

Steps for Conducting a Cluster Analysis: A Summary

Cluster Analysis