Segmentation and Profiling using SPSS for Windows

Segmentation and Profiling using SPSS for Windows Kate Grayson

Why Segmentation? • Used by e.g. retail and consumer product companies • Trying to learn about and describe their customers' buying habits, gender, age, income level, etc. • These companies tailor their marketing and product development strategies to each consumer group to increase sales and build brand loyalty. • A valuable approach in Market Research, and SPSS offers some useful tools to facilitate this commercial process

Segmentation in SPSS • Most of the techniques for segmentation and profiling are exploratory • There is no right or wrong answer, and the results are open to interpretation • Trying to make sense of the data or find patterns • Iterative techniques • If it does not make business sense then it is not a good model!

Segmentation in SPSS Techniques include: • Factor Analysis / Principal Components Analysis • Hierarchical Clustering • K-Means Cluster • Non-Linear Principal Components Analysis (PRINCALS/CATPCA) • The new Two-Step Cluster

Which Technique to Use? Cluster Analysis Categories Factor Analysis Exploratory Confirmatory Discriminant Analysis AnswerTree

Which Test to use? • Factor Analysis - to find patterns within variables • Categories - use if data doesn’t fit assumptions for Factor Analysis • Cluster Analysis - to find patterns between individuals • Two-Step Cluster – To use with both categorical and continuous variables • Discriminant Analysis - to look for differences between groups, try to predict target variable • AnswerTree - combinations of data, to predict target

Multivariate Analysis • These techniques are inter-related, but don’t have to use all of them • Can use a combination of these techniques to segment the data

Main Considerations • Looking for patterns or trying to make predictions? • Levels of Measurement of the data (categorical or continuous) • Sample size • Missing values • Does data fulfil assumptions for test?

Before you start……. ….. Check your data!

Handling Missing Data • Check before analysis for any patterns within missing data • Check before analysis that missing values are defined as missing - otherwise may compromise the model • Be aware that most segmentation techniques ignore any cases with missing values - so may have less usable data than you think!

Variable and Value Labels…. • It is worth checking the labels on your file • SPSS may truncate long variable and value labels in the output, making it difficult to interpret the output • Make sure all the useful information is at the beginning of the variable and value labels - so even if they are truncated, the output is still easy to read

Data Coding • Check the direction of the coding scheme, and maybe consider re-coding the data if the codes are counter-intuitive • e.g. if have a rating scale that ranges from high to low, rather than low to high… • ... it can be difficult to interpret output and factor scores etc. once the data has been through several transformations

Sample Data • Data = usage of underarm deodorants for men • Three brands tested: • ‘Rambo’: the current market leader • ‘Brad’ : second most popular • ‘Clint’ : recently launched product

Profiling the Customers.. ‘Clint’ isn’t selling as well as was hoped, so the research aims to find out: • Who is buying ‘Clint’? • What sort of characteristics do they share? • Who is buying the other deodorants tested? • How might the marketing campaign be changed to ensure that the correct market is targeted?

Data Collected • Ratings of a range of lifestyle attribute questions, e.g. ‘I tend to own the most up-to-date products’, ‘My family is most important thing in my life’, ‘I prefer to dress and entertain casually’ etc. (34 of these) • Demographics: age, type of work, exercise etc. • Brand of D/O usually use • How see yourself in relation to others, e.g. ‘What makes you distinctive from your friends’

Segmentation – the steps • Run Principal Components Analysis on ‘attribute rating’ questions, to see if any underlying dimension in the variables • Check using Discriminant Analysis to see if these dimensions help predict brand used • Run Cluster Analysis to see if can find similarities between cases • Decide if other variables need to be included, e.g. categorical demographics • Run Two-Step Cluster using all variables

Factor Analysis

Factor Analysis: what is it? • Looks for relationships between continuous variables (based on correlations), in this case ‘attribute rating’ questions • Derives underlying constructs or dimensions in the data • Tries to reduce a large number of variables to a small number of factors which explain most of the variance in the data • If can’t interpret the resulting solution then no good!

Run Principal Components Analysis on 34 rated attributes

Factor Analysis Results The best solution produced 9 factors, interpreted below: • F1: High computer use • F2: Rules, need to conform • F3: Party animal • F4: Family man • F5: Likes new products, experiments • F6: Likes pampering, pays more for trusted brands • F7: Cautious, follower rather than leader for new products • F8: Relaxed, casual • F9: Home loving

Do these factors help? Run Discriminant Analysis to see if can predict D/O used

Factor Analysis Results • The factors are good at predicting ‘Rambo’ usage, but not at differentiating between ‘Brad’ and ‘Clint’ • So try instead investigating relationships between cases – using Cluster Analysis • Options for clustering are: • Hierarchical Cluster • K-Means Cluster • Two-Step Cluster

Hierarchical Cluster • This is often thought of as the ‘proper cluster’ method • Looking for natural groupings within the data • Bases groupings upon the similarity or dissimilarity between cases, rather than variables • Very iterative technique – time consuming!

Clustering Data - Diagram = data point: one case

Decisions before Cluster: • Which variables to use? • Which distance measures between cases to use? • Which criteria for creating clusters to choose? NB The quality of the analysis will always depend upon the variables used Cluster Analysis will always find a solution! It is not possible to assess in the analysis itself how appropriate a variable is

Stages of Hierarchical Cluster: Select variables for analysis (carefully!) Build and assess model Save cluster membership If required, create cluster matrix for K-Means NB Because based on cases, need to make sure data is measured on same scale - if not, data should be standardized

Run Hierarchical Cluster Analysison Saved Factor Variables

Decision with D/O Data • I can’t get a very good (i.e. useful to the business) model from Hierarchical Cluster analysis • Also, I want to be able to include both categorical and continuous variables in the same model • So I decide to use Two-Step Cluster instead

Two-Step Cluster

Two-Step Cluster • The TwoStep Cluster Analysis procedure is an exploratory tool designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. • The algorithm employed by this procedure has several features that differentiate it from traditional clustering techniques: • The ability to create clusters based on both categorical and continuous variables. • Automatic selection of the number of clusters. • The ability to analyze large data files efficiently.

TwoStep Cluster • Uses scalable cluster analysis algorithm • This algorithm can handle both continuous and categorical variables or attributes and requires only one data pass in the procedure • The first step of the procedure pre-clusters the records into many small sub-clusters • Then it clusters the sub-clusters created in the pre-cluster step into the desired number of clusters • If the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters

Two-Step Cluster • This is unlike other clustering methods in SPSS - if the desired number of clusters is unknown, TwoStep Cluster analysis automatically finds the proper number of clusters • Or you can pre-specify the number of clusters required - flexibility

Run Two-Step Cluster Analysison Saved Factor Variablesand Categorical Variables

Link to more information • More useful information about Two-Step Cluster can be found at the following websites: • http://www.rrz.uni-hamburg.de/RRZ/Software/SPSS/Algorith.120/twostep_cluster.pdf • NB This was the handout for the talk, with algorithm etc. • Also useful: • http://www.spss.com/pdfs/S115AD8-1202A.pdf • http://www.norusis.com/pdf/SPC_v13.pdf

Some of the output producedby the Two-Step Cluster Analysis is reproduced in thenext few slides

Brand usually use by Cluster ‘Clint’ spray seems to be associated with Cluster 6, with the roll-on version being associated with Clusters 4 and 2

Employment Status by Cluster Cluster 2 (‘Clint’ roll-on) is largely made up of part-time, retired and not working respondents, Cluster 4 also has a high number of retired respondents, while Cluster 6 ‘Clint’ spray) also has a high percentage of part-time and unemployed.

Age Group by Cluster Cluster 2 (‘Clint’ roll-on) is largely made up of the younger and older age groups, Cluster 4 also has a high percentage of older respondents. Cluster 6 is more from 25 years upwards

Cluster 4 (‘Clint’ roll-on) has below average computer use and need to conform, above average on ‘Home Loving’ & ‘Family Man’

Cluster 6 (‘Clint’ spray) has above average scores on ‘Relaxed, Casual’ but not much else – this is Mr Laid Back!

Summary of Findings • Profiling of this data suggests that ‘Clint’ is not targeting the expected market • ‘Clint’ is often not seen as sufficiently different from ‘Brad’, it has no perceived USP • ‘Clint’ is being used by a high percentage of older, retired, and part-time or not employed consumers, which may be a result of the aggressive product launch campaign with free samples, discounted prices etc. • ‘Clint’ marketing needs some more work!

Summary of Segmenting and Profiling this data using SPSS • Principal Components Analysis helped investigate relationships between the rated attribute variables • Hierarchical Cluster was used to try and find similarities between cases, using the factors derived from PCA • Two-Step Cluster was then used to enable clustering of both continuous and categorical variables in the same model • Useful conclusions were drawn about the market positioning of ‘Clint’ deodorant

Segmentation and Profiling using SPSS for Windows

Segmentation and Profiling using SPSS for Windows

Presentation Transcript

SPSS 16 for Windows

Handling data using SPSS

Using SPSS for Chi Square

Using SPSS for Simple Regression

Using SPSS for Graphic Presentation

Using SPSS: Descriptive Statistics

Using SPSS “Compare Means”

Using SPSS and R for Mediation Analyses

Using SPSS: Introduction

Using SPSS

PROFILING, TRACKING AND REPORTING Profiling and Segmentation

Using SPSS for Quantitative Review and Descriptive Statistics

Using SPSS for Windows Part II

Using SPSS

Using SPSS for Inferential Statistics

Applied Statistics Using SPSS