410 likes | 591 Views
MKT 700 Business Intelligence and Decision Models. Week 8: Algorithms and Customer Profiling (1). Classification and Prediction. SPSS Direct Marketing. SPSS Analysis. Major Algorithms. Euclidean Distance. Euclidean Distance for Continuous Variables.
E N D
MKT 700Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)
Euclidean Distance for Continuous Variables • Pythagorean distance √d2 =√(a2+b2) • Euclidean space √d2 =√(a2+b2+c2) • Euclidean distance d=[(di)2]1/2
.10 .05 3.032 6.251 7.815 Statistical Inference • DF: (4 col –1) (2 rows –1) = 3
Log Likelihood • Cluster distance on probability distributions • Applicable to both categorical and continuous variables
Chi-Square: p < 0.05; DF = 1; Critical value = 3.84
F-Statistics • For metric or continuous variables • Compare explained (in the model) and unexplained variances (errors)
ANOVA • Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?
Variance SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR
Customer Profiling • Who is likely to buy or not respond? • Whois likely to buy what product or service? • Who is in danger of lapsing?
Profiling/Decision Tree • SPSS Direct Marketing Customer Profiling • SPSS Analysis Classification Decision Tree • CHAID (Chi-Square Automatic Interactive Detector) • CART (Classification and Regression Tree)
Use of Decision Trees • Classify observations from a target binary or nominal variable Segmentation • Predictive response analysis from a target numerical variable Behaviour • Decision support rules Processing
Example:dmdata.sav Underlying Theory X2
CHAID AlgorithmSelecting Variables • Example • Regions (4), Gender (3, including Missing)Age (6, including Missing) • For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*) (WSE, N*) • Select most significant variable • Go to next branch … and next level • Stop growing if …estimated X2 < theoretical X2
CART (Nominal Target) • Nominal Targets: • GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑pi2 • Example • Prob: Bus = 0.4, Car = 0.3, Train = 0.3 • Gini = 1 –(0.4^2 + 0.3^2 + 0.3^2) = 0.660
CART (Metric Target) • Continuous Variables: Variance Reduction (F-test)
Comparative Advantages(From Wikipedia) • Simple to understand and interpret • Requires little data preparation • Able to handle both numerical and categorical data • Uses a white box model easilyexplained by Boolean logic. • Possible to validate a modelusing statistical tests • Robust
Where to get help? http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp
Top line from Chapter 13 -1 • Analytics helps you to predict which recipients of your direct mail will buy your products, and which are not likely to buy. At $500 per thousand pieces, analytics can save you a lot of money. • Analytics is not as useful for e-mail marketing. The cost of appending data and the modeling often results in a loss, since the cost of mailing is only $6 per thousand. • Predictive models are based on previous promotions. You add demographic data (age, income, value of home, etc.) to a sample of your file and determine the differences between responders and non-responders. • Predictive modeling uses multiple regressions. It results in an algorithm—a mathematical formula that can be used to “score” any direct mailing file that has demographics appended, and predict, before you mail, which ones are going to respond. • Modeling does not always work. Sometimes what makes people buy is not based on demographics.
Top line from Chapter 13 -2 • Analytics can be used to reduce unsubscribes. If you have done LTV and know the value of your subscribers, you can calculate how much analytics would save you by not mailing unwanted material to some subscribers. • Very few e-mail marketers are doing any predictive modeling today, with good reason. • Direct mail gets higher response rates than e-mail partly because the shelf life of a direct mail piece or catalog can be weeks or months. An e-mail’s shelf life is one day or less. • Modeling can be useful for cross-sales—determining what other products your customers might buy. • Next-best product analytics and churn predictive analytics can be very profitable.
Top line from Chapter 13 -3 • CHAID is very useful for dividing your database into segments containing people with different interests and response rates. • Descriptive analytics is useful for advertising campaigns, but seldom useful for direct mail. • Clickstream data analysis can be very useful in planning the layout of a Web site or an e-mail. • Key performance indicators (KPIs) can help you determine the relative success of e-mail programs.