220 likes | 420 Views
Profit Mining: From Patterns to Action. Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University. Why Profit Mining?. A major obstacle in data mining application is the gap between: statistic-based pattern extraction and value-based decision making Profit mining :
E N D
Profit Mining:From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University
Why Profit Mining? • A major obstacle in data mining application is the gap between: • statistic-based pattern extraction and • value-based decision making • Profit mining: • value-based data mining
An Example • Suppose we want to maximize profit. Association rules [AIS93] {Perfume}->Lipstick (more often) {Perfume}->Diamond (more profit) do not suggest which items (and prices) to recommend to a customer who bought Perfume. • Similar problems with correlation, classification, etc.
The Problem • Given: several transactions of form: • {<I,P,Q>,…, <I,P,Q> | <I,P,Q>}, for Item, Promotion code, and Quantity. | separates non-target items and target items. • {<FlakedChick., $3,2> | <Sunchip,$1,1>} • Recommend target <I,P> to customers who buy non-target items, to maximize profit.
Not Prediction Problem • An example: • 100 customers each bought 1 pack for $1/pack. Profit=100(1-0.5)=$50. • 100 customers each bought 4 packs for $3.2/4-pack. Profit=100(3.2-2)=$120. • Prediction repeats the history. • Profit mining gets smarter from the history, by • recommending “right items” and “right prices”.
Challenge I - notion of profit • Pure statistic approach favors • {Perfume}-> Lipstick • Pure profit approach favors • {Perfume}-> Diamond. • Profit mining considers: • both statistical significance and profit significance.
Challenge II - customer intention • Mining On Availability (MOA): • Paying a higher price implies the willingness to pay a lower price. • {<FC,$3>} -> <Sunchip,$1> can be extracted from transaction {<FC,$5> | <Sunchip,$1.5>} • Recognizing this behavior brings new sales opportunities (at lower price).
Challenge III - search space • Thousands of items, and much more sales. Any combination can trigger a recommendation. • Search at alternative concepts (food, meat, etc) and prices makes it worse.
Step 1: generating rules • Association rules • {Diaper -> Beer}, supp=10%, conf=80% • Recommendation rules: • {g1,…,gk} -> <I,P>, where gi is <Item,Price>, or Item, or Concept. • {<FlakedChick. , $3.8>} -> <Sunchip,$4.5> • {FlakedChick.} -> <Sunchip,$4.5> • {Meat} -> <Sunchip,$4.5>
Step 2: building the model • We rank rules by the “average profit” made by the recommendation of a rule. • {<FC,$3.5>} -> <Sunchip,$1> matches • t1: {<FC,$4.0>| <Sunchip,$2>} (a hit) • t2: {<FC,$4.5>|<Milk,$3.5>} ( a miss) • If the cost of Sunchip is $0.7, the average profit is $0.15. • To recommend, we select the matching rule of the highest possible rank.
Step 3: Pruning the model • The model favors “high average profit” rules. • Such rules may bring a large profit. • Such rules may be random noise. • Cannot prune them simply based on statistical frequency.
Pruning the model • We prune rules to increase the estimated profit on the whole population. • We organize rules into specificity tree: the parent is the highest ranked general rule of a child. • We cut off the tree to maximize the estimated profit.
Evaluation • Synthetic datasets: IBM synthetic data generator, modified to have price and cost. • 1000 items and 1000K transactions • For non-target item i: • cost(i)=c/i • price j=(1+j*10%)cost(i), j=1,2,3,4. • For target items: • Dataset I has 2 target items • Dataset II has 10 target items
Conclusion • Proposed a new direction of data mining: Mining for profit. • Directly factor in business goal into data mining • Related work: microeconomic view of data mining [KPR98]