220 likes | 376 Views
Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products. Jianfu Chen, David S. Warren Stony Brook University. Classification is a fundamental problem in information management. UNSPSC. Vehicles and their Accessories and Components (25).
E N D
Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University
Classification is a fundamental problem in information management. UNSPSC Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Office Equipment and Accessories and Supplies (44) Segment Product description Email content Marine transport (11) Motor vehicles (10) Aerospace systems (20) Family Product and material transport vehicles (16) Safety and rescue vehicles (17) Passenger motor vehicles (15) Class Spam Ham Buses (02) Automobiles or cars (03) Limousines (06) Commodity
How should we design a classifier for a given real world task?
Method 1. No Design f(x) Training Set Test Set Try Off-the-shelf Classifiers SVM Logistic-regression Decision Tree Neural Network ... Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy
Method 2. Optimize what we really care about What’s the use of the classifier? How do we evaluate the performance of a classifier according to our interests? Quantify what we really care about Optimize what we care about
Hierarchical classification of commercial products UNSPSC Textual product description Vehicles and their Accessories and Components (25) Food Beverage and Tobacco Products (50) Office Equipment and Accessories and Supplies (44) Segment Marine transport (11) Motor vehicles (10) Aerospace systems (20) Family Product and material transport vehicles (16) Safety and rescue vehicles (17) Passenger motor vehicles (15) Class Buses (02) Automobiles or cars (03) Limousines (06) Commodity
Product taxonomy helps customers to find desired products quickly. • Facilitates exploring similar products • Helps product recommendation • Facilitates corporate spend analysis Toys&Games Looking for gift ideas for a kid? dolls puzzles building toys ...
We assume misclassificationof products leads to revenue loss. Textual product description of a mouse Product ... ... ... Desktop computer and accessories ... ... pet mouse keyboard lose part of the potential revenue realize an expected annual revenue
What do we really care about? A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss
Observation 1: the misclassification cost of a product depends on its potential revenue.
Observation 2: the misclassification cost of a product depends on how far apart the true class and the predicted class in the taxonomy. Textual product description of a mouse Product ... ... ... Desktop computer and accessories ... ... pet mouse keyboard
The proposed performance evaluation metric: average revenue loss revenue loss of product x • example weight is the potential annual revenue of product x • error function is the loss ratio • the percentage of the potential revenue a vendor will lose due to misclassification from class y to class y’. • a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))
Learning – minimizing average revenue loss Minimize convex upper bound
Multi-class SVM with margin re-scaling Convex upper bound of plug in any loss function
Dataset • UNSPSC (United Nations Standard Product and Service Code) dataset • Product revenues are simulated • revenue = price * sales
Experimental results Average revenue loss (in K$) of different algorithms
What’s wrong? Revenue loss ranges from a few K to several M
Loss normalization • Linearly scale loss function to a fixed range , say [1, 10] The objective now upper bounds both 0-1 loss and the average normalized loss.
Final results 7.88% reduction in average revenue loss! Average revenue loss (in K$) of different algorithms
Conclusion empirical risk, average misclassification cost: What do we really care about for this task? Minimize error rate? Minimize revenue loss? Performance evaluation metric regularized empirical risk minimization A general method: multi-class SVM with margin re-scaling and loss normalization How do we approximate the performance evaluation metric to make it tractable? Model + Tractable loss function Optimization Find the best parameters
Thank you! Questions?