1 / 14

Decision istics

Decision istics. Statistical Insight…. Better Decisions. Cluster Analysis Business Application & Conceptual Issues. Introduction.

ayla
Download Presentation

Decision istics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decisionistics Statistical Insight…. Better Decisions Cluster Analysis Business Application & Conceptual Issues

  2. Introduction • A financial analyst of an investment firm is interested in identifying a group of mutual funds that are look alike in a “true” context, not simply based on the way Morningstar rates them. • A marketing manager is interested in identifying similar cities that can be used for a test marketing campaign in which a new product might be introduced. • The Director of Marketing at a telecom firm wants to understand the types of people that he already knows are candidates for the firm’s new long distance service • A Golf Club General Manager wants to understand the “natural” segments of his members so that he can better utilize his clubs assets and understand how he might ideally want the club to look in the future.

  3. Cluster Overview • Cluster Analysis- A technique used for combining observations into groups or clusters such that : • Each group is homogenous or compact with respect to certain characteristics. • Each group should be different from other groups with respect to the same characteristic • Mathematically, we minimize the sums of squares within and maximize the sums of squares between.

  4. Cluster Overview • Cluster Analysis- its easy when: • You have a relatively small sample • You have nice, neat data • Your variables are continuous • Cluster Analysis- The Real World • Sometimes sample are small, but in business they’re large • We’d like our data to be free from error, containing no outliers, but that is rarely the case. • Variables are often a mix of continuous and categorical data

  5. Clustering- A Problematic Example • Take the following example: You are a firm trying to generate clusters about the Atlanta area with the objective of understanding zip codes to which you want to “mass”market your products. • Many different races exist. How do you cluster them? Typically, its: • 1) White • 2) African American • 3) Asian • 4) Hispanic • 5) Native American • 6) Non-white other • What will clustering do with this variable as it groups people?

  6. A Problematic Example cont’d • Can you cluster this simple example? • How will you interpret it (e.g., what’s a common way to look at the “answer” to see if you agree with the differentiation)?

  7. A Problematic Example cont’d • Cluster Means- What do they tell us? • Assume we have three clusters, and along the “race” dimension, they are as follows: • Cluster 1- Mean=2 • Cluster 2- Mean=4 • Cluster 3- Mean=1 • How do you use this data to assign people into clusters?

  8. Binary Variables- One Possible Solution?

  9. Application of Binary Clustering • A Golf Club General Manager wants to understand the “natural” segments of his members so that he can better utilize his clubs assets and understand how he might ideally want the club to look in the future. • How can cluster analysis help? • We took a look at the following • Demographic Information • Usage Information • Cost Information • Some data was measured and some was survey data

  10. Application of Binary Clustering

  11. Binary Variables- A Closer Look • How will these cases cluster? • What can we do about it?

  12. OBJECT 1 + - OBJECT 2 + a b - c d Jaccard Coefficient Many different uses , but its works great for clustering (see SPSS) a___ Sj = a + b + c where a is the sum of agreement (+ +) and b, c represent the sums of absent/present combinations (i.e. + - , and - +, respectively). The table below shows this convention of lettering for counts when calculating the similarity between two objects. Values of d are not considered because they represent complete disagreement.

  13. Another Application of Binary Clustering • A Security Company wants to understand the “natural” segments of those who have bought their service in the past, and those that have not. • What methods do we use first? • How can binary cluster analysis help (using Jaccard)? • Allows us to use categorical data. • Gives us unique summary insight into the true percentages of each cluster along various dimensions. • Not tricked by the zero problem.

  14. Issues for Further Research • Logistic Regression generates clusters…How Many? • Generate model • Assess predictive power • Score against database. • We know “who” is important, but “how” do we reach them? • Cluster Analysis • Which variables are important in clustering? • How do you know? • Clustering followed by Rule Induction • Develop clusters • Use as inputs into algorithm (CHAID) • Take simple rules and use to assess cases across a database

More Related