1 / 10

Cluster Analysis – 2 Approaches

Cluster Analysis – 2 Approaches. K-Means (traditional) Latent Class Analysis (new). by Jay Magidson, Statistical Innovations based in part on a presentation by Wagner Kamakura at Statistical Modeling Week 2004.

Download Presentation

Cluster Analysis – 2 Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Analysis – 2 Approaches • K-Means (traditional) • Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation by Wagner Kamakura at Statistical Modeling Week 2004

  2. K-Means clustering Partitioning algorithm • Partitions a data set into a pre-determined number of groups (clusters) that are homogeneous in terms of selected continuous variables Y1,Y2,… How it works: • User chooses the number K of clusters and selects variables to define these clusters • Step 1: Algorithm randomly positions each cluster at a point in the variable space • Step 2: Each case is assigned to the nearest of the K clusters using Euclidean distance • Step 3: Within-cluster means are computed and the clusters are re-positioned at this centroid point • The process is iterated until convergence

  3. Illustration of K-Means clustering – Step 1 Cluster 1 Cluster 3 Y2 Cluster 2 Y1

  4. Step 2: Cases Assigned to Clusters Y2 Y1

  5. Step 3: Clusters are Repositioned Y2 Y1

  6. Steps 2 and 3 are Repeated Y2 Y1

  7. Cases are Assigned to Repositioned Clusters, and Algorithm Continues Y2 Y1

  8. Y2 Gender Distance Female ? Male Single Married Divorced Y1 Marital Problems with K-Means Approach • Needs a metric for similarity or distance between pairs of respondents • No statistical criterion to choose number of clusters • Solution is not unique – depends on random start • Use of Euclidean distance implies that within each cluster, the variance of Y1 equals the variance of Y2 • Can’t classify respondents with missing data

  9. Latent Class Cluster Models Differences from traditional clustering approach • Based on similarity in response patterns, rather than distance between respondents • Maximum likelihood and posterior mode parameter estimation utilizes the EM algorithm, which corresponds to a probabilistic extension of the K-Means algorithm • Can be applied to variables of different scale types (discrete or continuous) • Statistical tests available to compare different models • Random sets of starting points to avoid local solutions • Missing data not a major problem

  10. Latent Class Cluster Analysis • Instead of using distances to classify cases into segments, it uses probabilities • Can handle nominal, ordinal and continuous variables (any combination of these) • Isn’t as sensitive to missing data as traditional cluster analysis techniques. • Easier to classify a respondent into a segment when some of the data is not available Reference: Magidson and Vermunt “Latent class models for clustering: A comparison with K-means”, Canadian Journal of Marketing Research, Vol. 20.1, 2002, pp.36-43.

More Related