280 likes | 474 Views
Applying Data Mining Technique to Direct Marketing. Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management National Yunlin University of Science and Technology. Outline. Motivation Objective Introduction Background The Generalized SOM Experiments
E N D
Applying Data Mining Technique to Direct Marketing Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management National Yunlin University of Science and Technology
Outline • Motivation • Objective • Introduction • Background • The Generalized SOM • Experiments • Conclusions
Motivation • Firms with the huge amount of complex marketing data on hand, need to further analysis and expect to make more profits. • Clustering, a technique of data mining, is especially suitable for segmenting data. • However, firm’s database usually consist of mixed data (numeric and categorical data).
Objective • We utilize a new visualized clustering algorithm, the generalized self-organizing map (GSOM), to segment customer data for direct marketing. • Unlike conventional SOM, the GSOM can reasonably express the relatively distance of categorical values. • Then, we apply GSOM to direct marketing would generate more profits.
Introduction (1/5) • Marketing practices have shifted to customer-oriented from traditional mass marketing. • Firms usually perform market segmentation and devise different marketing strategies for different segments.
Introduction (2/5) • Data mining means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from a huge amount of data. • Cluster analysis can assist marketers in identifying clusters of customers with similar characteristics.
Introduction (3/5) • The self-organizing map (SOM) network, proposed by Kohonen, is an useful visualized tool in data mining • Dimensionality reduction & Information visualization • Preserve the original topological relationship
Introduction (4/5) • The approach of the SOM in handling categorical data • It uses binary encoding that transforms categorical values to a set of binary values.
Introduction (5/5) • In this paper, we propose an extended SOM, named generalized SOM (GSOM), to overcome the drawback in handling categorical data • We construct the concept hierarchies for each categorical attributes.
(1) (2) (3) Background (1/2) • Self-organizing map, SOM • Find the winner (BMU) by (1) • Update the winner and neighborhood by (2)
Background (2/2) • Problems of the conventional SOM D(Coke, Pepsi) = D(Coke, Mocca) = D(Pepsi, Mocca)
SOM network ID Drink 1 Coke 2 Pepsi 3 Mocca Input pattern Any Juice Coffee Carbonated mq Orange Apple Latte Mocca Coke Pepsi x The Generalized SOM • We use concept hierarchies to help calculate the distances of categorical values • An input pattern and the GSOM vector are mapped to their associated concept hierarchies. • The distance between the input pattern and the GSOM vector is calculated by measuring the aggregated distance of mapping points in the hierarchies.
General concepts 0 1 2 1 1 1 1 1 1 1 1 1 Specific concepts Concept hierarchies (1/3) D(Coke, Pepsi) < D(Coke, Mocca) = D(Pepsi, Mocca)
Any mq=(Pepsi, 1.7) Juice Coffee Carbonated mq Orange Apple Latte Mocca Coke Pepsi Input pattern SOM network x Concept hierarchies (2/3) • A point X=(NX, dX) • NX: an anchor (leaf node) of point X • dX: a positive offset (distance) from X to root • Example: x=(Coke, 2.0); mq=(Pepsi, 1.7)
Any (4) Example: x=(Coke, 2.0); mq=(Pepsi, 1.7) Juice Coffee Carbonated (5) |x – mq | = 2 + 1.7 – 2×1 = 1.7 mq Orange Apple Latte Mocca Coke Pepsi x Concept hierarchies (3/3) duplication 0 1 2 red dx blue dmq
Experiments • Experiment dataset • Synthetic dataset consists of 6 groups of two categorical attributes, Department and Drink. • Real dataset Adult from the UCI repository • With 48,842 patterns of 15 attributes. • 8 categorical attributes, 6 numerical attributes, and 1 class attribute Salary. • 76% of the patterns have the value of ≤50K.
Experiments • Parameters were set according to the suggestion in the software package SOM_PAK. • Categorical values are transformed to binary values when we train the SOM. • While mixed data are used directly when we train the GSOM. Each link weight of concept hierarchies is set to 1.
Department Drink Synthetic dataset (1/2)
Binary SOM GSOM Synthetic dataset (2/2) • An 8×8 SOM network is used for the training. After 900 training iterations, the trained maps of SOM and GSOM under the same parameters are shown in below.
Real dataset (1/3) • We randomly draw 10,000 patterns which have 75.76% of ≤50K, similar to the Salary distribution of the original Adult dataset • Three categorical attributes, Marital-status, Relationship, and Education. • Four numeric attributes, Capital-gain, Capital-loss, Age, and Hours-per-week.
Relationship Marital-status Education Real dataset (2/3) • Concept hierarchies for the categorical attributes are constructed as shown in below.
Binary SOM GSOM Real dataset (3/3) • A 15×15 SOM network is used for the training. After 50,000 iterations, the trained maps of SOM and GSOM under the same parameters are shown in below.
Application to Direct Marketing (1/2) • After we utilize the GSOM to perform data clustering, this segmented dataset can be further applied to catalog marketing. • Suppose that • The cost of mailing a catalog is $2. • The customers whose salaries are over 50K, we make an average profit of $10 per person. • Otherwise, we make an average profit of $1 per person.
$14,344 $7,505 Application to Direct Marketing (2/2)
Conclusions • In this paper, we propose a data clustering method • The GSOM extends the conventional SOM and overcomes its drawback in handling categorical data by utilizing concept hierarchies. • The experimental results confirmed that the GSOM can better reveal the cluster structure of data than the conventional SOM does. • We can make more profits by the marketing based on the segmentation results of the GSOM than by the marketing to the customers randomly drawn from the customer database.