140 likes | 225 Views
Decision istics. Statistical Insight…. Better Decisions. Cluster Analysis Business Application & Conceptual Issues. Introduction.
E N D
Decisionistics Statistical Insight…. Better Decisions Cluster Analysis Business Application & Conceptual Issues
Introduction • A financial analyst of an investment firm is interested in identifying a group of mutual funds that are look alike in a “true” context, not simply based on the way Morningstar rates them. • A marketing manager is interested in identifying similar cities that can be used for a test marketing campaign in which a new product might be introduced. • The Director of Marketing at a telecom firm wants to understand the types of people that he already knows are candidates for the firm’s new long distance service • A Golf Club General Manager wants to understand the “natural” segments of his members so that he can better utilize his clubs assets and understand how he might ideally want the club to look in the future.
Cluster Overview • Cluster Analysis- A technique used for combining observations into groups or clusters such that : • Each group is homogenous or compact with respect to certain characteristics. • Each group should be different from other groups with respect to the same characteristic • Mathematically, we minimize the sums of squares within and maximize the sums of squares between.
Cluster Overview • Cluster Analysis- its easy when: • You have a relatively small sample • You have nice, neat data • Your variables are continuous • Cluster Analysis- The Real World • Sometimes sample are small, but in business they’re large • We’d like our data to be free from error, containing no outliers, but that is rarely the case. • Variables are often a mix of continuous and categorical data
Clustering- A Problematic Example • Take the following example: You are a firm trying to generate clusters about the Atlanta area with the objective of understanding zip codes to which you want to “mass”market your products. • Many different races exist. How do you cluster them? Typically, its: • 1) White • 2) African American • 3) Asian • 4) Hispanic • 5) Native American • 6) Non-white other • What will clustering do with this variable as it groups people?
A Problematic Example cont’d • Can you cluster this simple example? • How will you interpret it (e.g., what’s a common way to look at the “answer” to see if you agree with the differentiation)?
A Problematic Example cont’d • Cluster Means- What do they tell us? • Assume we have three clusters, and along the “race” dimension, they are as follows: • Cluster 1- Mean=2 • Cluster 2- Mean=4 • Cluster 3- Mean=1 • How do you use this data to assign people into clusters?
Application of Binary Clustering • A Golf Club General Manager wants to understand the “natural” segments of his members so that he can better utilize his clubs assets and understand how he might ideally want the club to look in the future. • How can cluster analysis help? • We took a look at the following • Demographic Information • Usage Information • Cost Information • Some data was measured and some was survey data
Binary Variables- A Closer Look • How will these cases cluster? • What can we do about it?
OBJECT 1 + - OBJECT 2 + a b - c d Jaccard Coefficient Many different uses , but its works great for clustering (see SPSS) a___ Sj = a + b + c where a is the sum of agreement (+ +) and b, c represent the sums of absent/present combinations (i.e. + - , and - +, respectively). The table below shows this convention of lettering for counts when calculating the similarity between two objects. Values of d are not considered because they represent complete disagreement.
Another Application of Binary Clustering • A Security Company wants to understand the “natural” segments of those who have bought their service in the past, and those that have not. • What methods do we use first? • How can binary cluster analysis help (using Jaccard)? • Allows us to use categorical data. • Gives us unique summary insight into the true percentages of each cluster along various dimensions. • Not tricked by the zero problem.
Issues for Further Research • Logistic Regression generates clusters…How Many? • Generate model • Assess predictive power • Score against database. • We know “who” is important, but “how” do we reach them? • Cluster Analysis • Which variables are important in clustering? • How do you know? • Clustering followed by Rule Induction • Develop clusters • Use as inputs into algorithm (CHAID) • Take simple rules and use to assess cases across a database