250 likes | 381 Views
Estimating Business Targets. Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp. 420-425. Abstract . Propose a new solution to the classical econometric task of frontier analysis Combine nearest neighbor methods and classical statistical methods
E N D
Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp. 420-425. IDSL seminar
Abstract • Propose a new solution to the classical econometric task of frontier analysis • Combine nearest neighbor methods and classical statistical methods • Identify under marketed customers • Benchmark regional directory divisions IDSL seminar
Outline • Motivation • Objective • Historical approaches • Target estimation methodology • Case study • Conclusion • Personal opinion IDSL seminar
Motivation • Setting targets is a critical task • Setting the target of each entity to the average amongst the entities traditionally • Two challenges • The characteristics of the entities will have a heavy influence on the outcome • The inherent unsupervised nature of the problem IDSL seminar
Objective • Provide a methodology for estimating unsupervised maximal or minimal targets • Setting revenue target expectations for individual customers • Revenue target setting for regional yellow page directories IDSL seminar
Historical Approaches • Mathematical programming • Economics IDSL seminar
Mathematical Programming • where is the target for xi, a vector for the ith observation • Sensitivity to errors or outliers since it assumes that all observed targets define the possible space IDSL seminar
Economics • where is a non-negative error term • The requirement of a model for the error term and for g IDSL seminar
Target Estimation Methodology • Nearest neighbor vs. clustering • The neighborhoods • The distance function • Target estimation from the neighborhoods • A heuristic for comparing neighborhoods IDSL seminar
Nearest Neighbor vs. Clustering • Time complexity • Clustering is better than nearest neighbor • Problem of clustering • Two similar entities fall into different cluster • Dimension higher, influence more serious • But nearest neighbor is not so IDSL seminar
The Neighborhoods • xi: ith observation • yi: the variable containg its target value • ni: neighborhood for xi, where ni is a set of observations {xi, xj, …} IDSL seminar
The Distance Function Continuous standardize e.g. Continuous- (2,1)(3,4) Nominal- (a,b)(a,c) IDSL seminar
Target Estimation From the Neighborhoods • Let yi(1), yi(2), …, yi(k) be the order statistics, so that yi(1) is the largest IDSL seminar
A Heuristic for Comparing Neighborhoods • Maximal frontier E(xi) will range from 0 to 1 • Minimal frontier E(xi) >=1 IDSL seminar
Case Study • Target revenues for directory book advertisers • Target revenue for regional directories IDSL seminar
(1) Target Revenues for Directory Book Advertisers • Goal • Find businesses that have low spending relative to those with otherwise similar characteristics • Three categories of data available • Advertiser: e.g. number of employees • Directory: e.g. distribution size • Market : e.g. median household income IDSL seminar
Calculating Nearest Neighbors • Standardize continuous data: natural log • K=4 • Weight the variables equally • But decrease the weights for many of the directory and market variables IDSL seminar
Distribution for E(x) for Advertisers IDSL seminar
A Decision Tree to Predict phi -xi IDSL seminar
(2) Target Revenue for Regional Directories • Goal • Benchmark regional directory divisions • Separate the data into two sets • Training set: 80% • Test set: 20% • K=4 IDSL seminar
Book Type • System book • an entire serving area • System-neighborhood book • A smaller number of geographic areas in the franchise area • Neighborhood book • Areas outside of the telephone company’s franchise area IDSL seminar
Four Different Distributions labeled according to the legend IDSL seminar
The x-axis shos log(distribution) and the y-axis E(x) Neigborhood books System books Non-system books IDSL seminar
Conclusion • Present a general data mining methodology for estimating business targets by frontier analysis • First case • Increase sales focus on the under-marketed customers • Increase the potential revenue by several million • Second case • Estimate optimal revenue performance targets for directory divisions • Increase for directory books is a minimum of several million dollars IDSL seminar
Personal opinion • Combine several existed methodologies or disciplines can make new powerful one IDSL seminar