1 / 43

Seven (plus or minus two) Clusters, A Monte Carlo Study

Seven (plus or minus two) Clusters, A Monte Carlo Study. Larry Hoyle, Policy Research Institute, The University of Kansas. 1972 Kansas Statistical Abstract. 30 Years Ago. Shading by Overprinting. Shading by Line Spacing. 20 Years Ago. Line Shading Detail.

zorina
Download Presentation

Seven (plus or minus two) Clusters, A Monte Carlo Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seven (plus or minus two) Clusters,A Monte Carlo Study Larry Hoyle, Policy Research Institute, The University of Kansas

  2. 1972 Kansas Statistical Abstract 30 Years Ago

  3. Shading by Overprinting

  4. Shading by Line Spacing 20 Years Ago

  5. Line Shading Detail

  6. What did they have in common? • Neither method is “continuous” • So both methods required grouping or classes Fixed number of combinations Characters on a fixed grid Integer number of lines in the polygon Lines are relatively coarse

  7. How to Group for Shading • Equal Intervals • Equal numbers (quantiles) • By clusters • Don’t group (unclassed)

  8. Population Density – 7 Equal Intervals 100 counties fall into the bottom class

  9. Population Density - Equal Numbers 15 counties in each class - a very different picture

  10. Population Density - Cluster Means Group around the 7 values that “best” represent the data

  11. Population Density - Unclassed No classes, just shade in proportion to value

  12. Clustering • Tries for “Best” grouping • Each member of cluster can be represented by the mean of the group

  13. Proc Fastclus • You specify the number of clusters • Minimizes cluster sum of squared distance (e.g. minimum within cluster variance) • inspired by: – k-means (MacQueen) leader algorithm (Hartigan)

  14. Example clustering - data

  15. 4 clusters y . data cluster R-squared=.9912 0 10 20 30 40 50 60 70 80 90 x

  16. 4 clusters data Correlation .9956 R-squared=.9912

  17. 3 clusters y . data cluster R-squared=.9609 0 10 20 30 40 50 60 70 80 90 x

  18. How many clusters is enough?

  19. Plot R-squared by number of clusters Sample of 300 observations, Uniform distribution, 11 cluster analyses

  20. What happens if there really aren’t any clusters?Let’s try 500 samples

  21. Uniform, 300 obs. per sample 500 samples, 11 clusterings each

  22. Uniform, 1000 obs. per sample 500 samples, 11 clusterings each

  23. Normal, 300 obs. per sample 500 samples, 11 clusterings each

  24. Normal, 1000 obs. per sample 500 samples, 11 clusterings each

  25. Exponential, 300 obs. per sample 500 samples, 11 clusterings each

  26. Distribution of worst sample

  27. Exponential, 1000 obs. per sample 500 samples, 11 clusterings each

  28. So What’s with 72?

  29. Uniform, 72 500 samples, 11 clusterings each

  30. Normal, 72 500 samples, 11 clusterings each

  31. Exponential, 72 500 samples, 11 clusterings each

  32. Minimum R squared by sample size and distribution At least 95% of the variance for all

  33. Histograms • Equal intervals • Number of observations in each interval

  34. Needle Plotof Cluster Means

  35. Bar chart needs more bars

  36. The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Information Processing George Miller, The Psychological Review1956, vol.63 pp. 81-97

  37. Limits on Categories for Absolute Judgments • Pitch 6 • Loudness 5 • Visual position 9 • Size of a square 5 • Hue 8 Name the colors in this slide

  38. “And finally, what about the magical number seven?” George A. Miller

  39. Miller – Quote 1 “What about the • seven wonders of the world • seven seas • seven deadly sins • seven daughters of Atlas in the Pleiades • seven ages of man • seven levels of hell • seven primary colors • seven notes of the musical scale • seven days of the week”

  40. Miller – Quote 2 “What about the • seven-point rating scale • seven categories for absolute judgment • seven objects in the span of attention • seven digits in the span of immediate memory”

  41. Miller – Quote 3 “…Perhaps there is something deep and profound behind all these sevens, something just calling out for us to discover it.”

  42. Miller - close “But I suspect that it is only a pernicious, Pythagorean coincidence.”

  43. Coincidence or Nature’s Parsimony? Does our capacity match what’s needed for 95% of the variance? 95%? Hmmmm……. Larry Hoyle Policy Research Institute University of Kansas LarryHoyle@ku.edu confidence intervals an A 19 fingers and toes 970,000 web pages

More Related