470 likes | 1.84k Views
Segmentation and Profiling using SPSS for Windows. Kate Grayson. Why Segmentation?. Used by e.g. retail and consumer product companies Trying to learn about and describe their customers' buying habits, gender, age, income level, etc.
E N D
Segmentation and Profiling using SPSS for Windows Kate Grayson
Why Segmentation? • Used by e.g. retail and consumer product companies • Trying to learn about and describe their customers' buying habits, gender, age, income level, etc. • These companies tailor their marketing and product development strategies to each consumer group to increase sales and build brand loyalty. • A valuable approach in Market Research, and SPSS offers some useful tools to facilitate this commercial process
Segmentation in SPSS • Most of the techniques for segmentation and profiling are exploratory • There is no right or wrong answer, and the results are open to interpretation • Trying to make sense of the data or find patterns • Iterative techniques • If it does not make business sense then it is not a good model!
Segmentation in SPSS Techniques include: • Factor Analysis / Principal Components Analysis • Hierarchical Clustering • K-Means Cluster • Non-Linear Principal Components Analysis (PRINCALS/CATPCA) • The new Two-Step Cluster
Which Technique to Use? Cluster Analysis Categories Factor Analysis Exploratory Confirmatory Discriminant Analysis AnswerTree
Which Test to use? • Factor Analysis - to find patterns within variables • Categories - use if data doesn’t fit assumptions for Factor Analysis • Cluster Analysis - to find patterns between individuals • Two-Step Cluster – To use with both categorical and continuous variables • Discriminant Analysis - to look for differences between groups, try to predict target variable • AnswerTree - combinations of data, to predict target
Multivariate Analysis • These techniques are inter-related, but don’t have to use all of them • Can use a combination of these techniques to segment the data
Main Considerations • Looking for patterns or trying to make predictions? • Levels of Measurement of the data (categorical or continuous) • Sample size • Missing values • Does data fulfil assumptions for test?
Handling Missing Data • Check before analysis for any patterns within missing data • Check before analysis that missing values are defined as missing - otherwise may compromise the model • Be aware that most segmentation techniques ignore any cases with missing values - so may have less usable data than you think!
Variable and Value Labels…. • It is worth checking the labels on your file • SPSS may truncate long variable and value labels in the output, making it difficult to interpret the output • Make sure all the useful information is at the beginning of the variable and value labels - so even if they are truncated, the output is still easy to read
Data Coding • Check the direction of the coding scheme, and maybe consider re-coding the data if the codes are counter-intuitive • e.g. if have a rating scale that ranges from high to low, rather than low to high… • ... it can be difficult to interpret output and factor scores etc. once the data has been through several transformations
Sample Data • Data = usage of underarm deodorants for men • Three brands tested: • ‘Rambo’: the current market leader • ‘Brad’ : second most popular • ‘Clint’ : recently launched product
Profiling the Customers.. ‘Clint’ isn’t selling as well as was hoped, so the research aims to find out: • Who is buying ‘Clint’? • What sort of characteristics do they share? • Who is buying the other deodorants tested? • How might the marketing campaign be changed to ensure that the correct market is targeted?
Data Collected • Ratings of a range of lifestyle attribute questions, e.g. ‘I tend to own the most up-to-date products’, ‘My family is most important thing in my life’, ‘I prefer to dress and entertain casually’ etc. (34 of these) • Demographics: age, type of work, exercise etc. • Brand of D/O usually use • How see yourself in relation to others, e.g. ‘What makes you distinctive from your friends’
Segmentation – the steps • Run Principal Components Analysis on ‘attribute rating’ questions, to see if any underlying dimension in the variables • Check using Discriminant Analysis to see if these dimensions help predict brand used • Run Cluster Analysis to see if can find similarities between cases • Decide if other variables need to be included, e.g. categorical demographics • Run Two-Step Cluster using all variables
Factor Analysis: what is it? • Looks for relationships between continuous variables (based on correlations), in this case ‘attribute rating’ questions • Derives underlying constructs or dimensions in the data • Tries to reduce a large number of variables to a small number of factors which explain most of the variance in the data • If can’t interpret the resulting solution then no good!
Factor Analysis Results The best solution produced 9 factors, interpreted below: • F1: High computer use • F2: Rules, need to conform • F3: Party animal • F4: Family man • F5: Likes new products, experiments • F6: Likes pampering, pays more for trusted brands • F7: Cautious, follower rather than leader for new products • F8: Relaxed, casual • F9: Home loving
Do these factors help? Run Discriminant Analysis to see if can predict D/O used
Factor Analysis Results • The factors are good at predicting ‘Rambo’ usage, but not at differentiating between ‘Brad’ and ‘Clint’ • So try instead investigating relationships between cases – using Cluster Analysis • Options for clustering are: • Hierarchical Cluster • K-Means Cluster • Two-Step Cluster
Hierarchical Cluster • This is often thought of as the ‘proper cluster’ method • Looking for natural groupings within the data • Bases groupings upon the similarity or dissimilarity between cases, rather than variables • Very iterative technique – time consuming!
Clustering Data - Diagram = data point: one case
Decisions before Cluster: • Which variables to use? • Which distance measures between cases to use? • Which criteria for creating clusters to choose? NB The quality of the analysis will always depend upon the variables used Cluster Analysis will always find a solution! It is not possible to assess in the analysis itself how appropriate a variable is
Stages of Hierarchical Cluster: Select variables for analysis (carefully!) Build and assess model Save cluster membership If required, create cluster matrix for K-Means NB Because based on cases, need to make sure data is measured on same scale - if not, data should be standardized
Decision with D/O Data • I can’t get a very good (i.e. useful to the business) model from Hierarchical Cluster analysis • Also, I want to be able to include both categorical and continuous variables in the same model • So I decide to use Two-Step Cluster instead
Two-Step Cluster • The TwoStep Cluster Analysis procedure is an exploratory tool designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. • The algorithm employed by this procedure has several features that differentiate it from traditional clustering techniques: • The ability to create clusters based on both categorical and continuous variables. • Automatic selection of the number of clusters. • The ability to analyze large data files efficiently.
TwoStep Cluster • Uses scalable cluster analysis algorithm • This algorithm can handle both continuous and categorical variables or attributes and requires only one data pass in the procedure • The first step of the procedure pre-clusters the records into many small sub-clusters • Then it clusters the sub-clusters created in the pre-cluster step into the desired number of clusters • If the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters
Two-Step Cluster • This is unlike other clustering methods in SPSS - if the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters • Or you can pre-specify the number of clusters required - flexibility
Run Two-Step Cluster Analysison Saved Factor Variablesand Categorical Variables
Link to more information • More useful information about Two-Step Cluster can be found at the following websites: • http://www.rrz.uni-hamburg.de/RRZ/Software/SPSS/Algorith.120/twostep_cluster.pdf • NB This was the handout for the talk, with algorithm etc. • Also useful: • http://www.spss.com/pdfs/S115AD8-1202A.pdf • http://www.norusis.com/pdf/SPC_v13.pdf
Some of the output producedby the Two-Step Cluster Analysis is reproduced in thenext few slides
Brand usually use by Cluster ‘Clint’ spray seems to be associated with Cluster 6, with the roll-on version being associated with Clusters 4 and 2
Employment Status by Cluster Cluster 2 (‘Clint’ roll-on) is largely made up of part-time, retired and not working respondents, Cluster 4 also has a high number of retired respondents, while Cluster 6 ‘Clint’ spray) also has a high percentage of part-time and unemployed.
Age Group by Cluster Cluster 2 (‘Clint’ roll-on) is largely made up of the younger and older age groups, Cluster 4 also has a high percentage of older respondents. Cluster 6 is more from 25 years upwards
Cluster 4 (‘Clint’ roll-on) has below average computer use and need to conform, above average on ‘Home Loving’ & ‘Family Man’
Cluster 6 (‘Clint’ spray) has above average scores on ‘Relaxed, Casual’ but not much else – this is Mr Laid Back!
Summary of Findings • Profiling of this data suggests that ‘Clint’ is not targeting the expected market • ‘Clint’ is often not seen as sufficiently different from ‘Brad’, it has no perceived USP • ‘Clint’ is being used by a high percentage of older, retired, and part-time or not employed consumers, which may be a result of the aggressive product launch campaign with free samples, discounted prices etc. • ‘Clint’ marketing needs some more work!
Summary of Segmenting and Profiling this data using SPSS • Principal Components Analysis helped investigate relationships between the rated attribute variables • Hierarchical Cluster was used to try and find similarities between cases, using the factors derived from PCA • Two-Step Cluster was then used to enable clustering of both continuous and categorical variables in the same model • Useful conclusions were drawn about the market positioning of ‘Clint’ deodorant