320 likes | 489 Views
Support Vector Machines in Marketing. Georgi Nalbantov. MICC, Maastricht University. Contents. Purpose Linear Support Vector Machines Nonlinear Support Vector Machines (Theoretical justifications of SVM) Marketing Examples Conclusion and Q & A (some extensions). Purpose.
E N D
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University
Contents • Purpose • Linear Support Vector Machines • Nonlinear Support Vector Machines • (Theoretical justifications of SVM) • Marketing Examples • Conclusion and Q & A • (some extensions)
Purpose • Task to be solved (The Classification Task): Classify cases (customers) into “type 1” or “type 2” on the basis of some known attributes (characteristics) • Chosen tool to solve this task:Support Vector Machines
The Classification Task • Given data on explanatory and explained variables, where the explained variable can take two values { 1 }, find a function that gives the “best” separation between the “-1” cases and the “+1” cases: Given: ( x1, y1 ), … , ( xm , ym ) n { 1 } Find: : n { 1 } “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k )is minimal • Existing techniques to solve the classification task: • Linear and Quadratic Discriminant Analysis • Logit choice models (Logistic Regression) • Decision trees, Neural Networks, Least Squares SVM
Support Vector Machines: Definition • Support Vector Machines are a non-parametric tool for classification/regression • Support Vector Machines are used for prediction rather than description purposes • Support Vector Machines have been developed by Vapnik and co-workers
∆ ∆ ∆ ∆ ∆ ∆ ● ● ∆ ● ● ∆ ● ∆ ● ● ● ● ● ● Linear Support Vector Machines • A direct marketing company wants to sell a new book: “The Art History of Florence” • Nissan Levin and Jacob Zahavi in Lattin, Carroll and Green (2003). • Problem: How to identify buyers and non-buyers using the two variables: • Months since last purchase • Number of art books purchased ∆buyers ● non-buyers Number of art books purchased Months since last purchase
∆ ∆ ∆ ∆ ∆ ∆ ● ● ● ● ● ● ● ● Linear SVM: Separable Case • Main idea of SVM:separate groups by a line. • However: There are infinitely many lines that have zero training error… • … which line shall we choose? ∆buyers ● non-buyers Number of art books purchased Months since last purchase
∆ ∆ ∆ margin ∆ ∆ ∆ ● ● ● ● ● ● ● ● Linear SVM: Separable Case • SVM use the idea of a margin around the separating line. • The thinner the margin, • the more complex the model, • The best line is the one with thelargest margin. ∆buyers ● non-buyers Number of art books purchased Months since last purchase
∆ ∆ ∆ ∆ ∆ ∆ ● ● ● ● ● ● ● ● Linear SVM: Separable Case • The line having the largest margin is:w1x1 + w2x2 + b= 0 • Where • x1 = months since last purchase • x2 = number of art books purchased • Note: • w1xi 1 + w2xi 2 + b +1 for i ∆ • w1xj 1 + w2xj 2 + b –1 for j ● x2 w1x1 + w2x2 + b= 1 w1x1 + w2x2 + b= 0 w1x1 + w2x2 + b= -1 Number of art books purchased margin x1 Months since last purchase
∆ ∆ ∆ ∆ ∆ ∆ ● ● ● maximize the margin minimize minimize ● ● ● ● ● Linear SVM: Separable Case • The width of the margin is given by: • Note: x2 w1x1 + w2x2 + b= 1 w1x1 + w2x2 + b= 0 w1x1 + w2x2 + b= -1 Number of art books purchased margin x1 Months since last purchase
∆ ∆ ∆ ∆ ∆ ∆ ● ● ● minimize maximize the margin minimize ● ● ● ● ● Linear SVM: Separable Case • The optimization problem for SVM is: • subject to: • w1xi 1 + w2xi 2 + b +1for i ∆ • w1xj 1 + w2xj 2 + b –1 for j ● x2 margin x1
∆ ∆ ∆ ∆ ∆ ∆ ● ● ● ● ● ● ● ● Linear SVM: Separable Case • “Support vectors” are those points that lie on the boundaries of the margin • The decision surface (line) is determined only by the support vectors. All other points are irrelevant “Support vectors” x2 x1
Non-separable case: there is no line separating errorlessly the two groups Here, SVM minimizeL(w,C) : subject to: w1xi 1 + w2xi 2 + b +1 –ifor i ∆ w1xj 1 + w2xj 2 + b –1 +i for j ● I,j 0 maximize the margin minimize thetraining errors Linear SVM: Nonseparable Case Training set: 1000 targeted customers x2 ∆buyers ● non-buyers w1x1 + w2x2 + b= 1 ∆ ∆ ∆ ∆ ∆ ∆ ● ● ∆ ● L(w,C) = Complexity + Errors ● ∆ ● ∆ ● ● ● ● ● ● x1
∆ ∆ ∆ ∆ ∆ ∆ ● ● ∆ ∆ ● ● ● ● ● ● ● ● x2 C = 1 x1 Linear SVM: The Role of C x2 C = 5 ∆ ∆ x1 • Bigger C • Smaller C decreased complexity increased complexity ( widermargin ) ( thinnermargin ) bigger number errors smaller number errors ( worse fit on the data ) ( better fit on the data ) • Vary both complexity and empirical error via C … by affecting the optimal w and optimal number of training errors
∆ ∆ ∆ ∆ ∆ ∆ ● ● ∆ ● ● ∆ ● ∆ ● ● ● ● ● ● Nonlinear SVM: Nonseparable Case • Mapping into a higher-dimensional space • Optimization task: minimizeL(w,C) • subject to: • ∆ • ● x2 x1
∆ ∆ ● x2 ● ● ∆ (-1,1) (1,1) ∆ ● (-1,-1) Nonlinear SVM: Nonseparable Case • Map the data into higher-dimensional space: 2 3 ∆ x1 ● (1,-1)
∆ ∆ ● x2 ● ● ∆ (-1,1) (1,1) ∆ ● (-1,-1) Nonlinear SVM: Nonseparable Case • Find the optimal hyperplane in the transformed space ∆ x1 ● (1,-1)
∆ ∆ ● ● Nonlinear SVM: Nonseparable Case • Observe the decision surface in the original space (optional) x2 ∆ ● ∆ x1 ● ∆ ●
Nonlinear SVM: Nonseparable Case • Dual formulation of the (primal) SVM minimization problem Primal Dual Subject to Subject to
Nonlinear SVM: Nonseparable Case • Dual formulation of the (primal) SVM minimization problem Dual Subject to (kernel function)
Nonlinear SVM: Nonseparable Case • Dual formulation of the (primal) SVM minimization problem Dual Subject to (kernel function)
Strengths and Weaknesses of SVM • Strengths of SVM: • Training is relatively easy • No local minima • It scales relatively well to high dimensional data • Trade-off between classifier complexity and error can be controlled explicitly via C • Robustness of the results • The “curse of dimensionality” is avoided • Weaknesses of SVM: • What is the best trade-off parameter C ? • Need a good transformation of the original space
The Ketchup Marketing Problem • Two types of ketchup: Heinz and Hunts • Seven Attributes • Feature Heinz • Feature Hunts • Display Heinz • Display Hunts • Feature&Display Heinz • Feature&Display Hunts • Log price difference between Heinz and Hunts • Training Data: 2498 cases (89.11% Heinz is chosen) • Test Data: 300 cases (88.33% Heinz is chosen)
The Ketchup Marketing Problem • Choose a kernel mapping: Cross-validation mean squared errors, SVM with RBF kernel Linear kernel Polynomial kernel RBF kernel • Do (5-fold ) cross-validation procedure to find the best combination of the manually adjustable parameters (here: C and σ) C min max σ
Conclusion • Support Vector Machines (SVM) can be applied in the binaryand multi-class classification problems • SVM behave robustly in multivariate problems • Further research in various Marketing areas is needed to justifyor refute the applicability of SVM • Support Vector Regressions (SVR) can also be applied • http://www.kernel-machines.org • Email: nalbantov@few.eur.nl