260 likes | 382 Views
Extreme Re-balancing for SVMs: a case study. Advisor : Dr. Hsu Reporter : Wen-Hsiang Hu Author : Bhavani Raskutti and Adam Kowalczyk. Sigkdd Explorations. Outline. Motivation Objective Related Research Support Vector Machines Re-balancing of the Data Sample Balancing
E N D
Extreme Re-balancing for SVMs: a case study Advisor :Dr. Hsu Reporter:Wen-Hsiang Hu Author:Bhavani Raskutti and Adam Kowalczyk Sigkdd Explorations
Outline • Motivation • Objective • Related Research • Support Vector Machines • Re-balancing of the Data • Sample Balancing • Weight Balancing • Experimental • Discussion • Conclusion • Personal Opinion
Motivation • A standard recipe for two class discrimination is to take examples from both classes, then generate a model for discriminating them. However, there are many applications were obtaining examples of a second class is difficult. • e.g. classifying sites of “interest” to a web surfer • There are situations when the data has heavily unbalanced representatives of the two classes of interest, • e.g. fraud detection and information filtering
Objective • Get better performance by one-class learners
Related Research (1/2) • Many solutions have been proposed to address the imbalance problem including sampling and weighting examples. • Typically, these methods focus on cases when the imbalance ratio of minority to majority class is around 10:90 • In this paper, we focus on extreme imbalance in very high dimensional input spaces , where at the learning stage the minority class consists of around 1-3% of data.
Related Research (2/2) • In both cases (image retrieval and document classification) • One-class models are much worse than the two-class models • In this paper, we show that for certain problems such as the gene knock-out experiments for understanding AHR(芳香巠基碳水化合物接受器) signalling pathway • minority one-class SVMs significantly outperform models learnt using examples from both classes.
Support Vector Machines (1/4) • Given a training sequence (xi,yi) of binary n-vectorsand bipolar labels • Our aim is to find a “good” discriminating function • kernel machine:
Support Vector Machines (4/4) • If the kernel k satisfies the Mercer theorem assumptions[7;24;25] then for the minimiser of (2) we have where • We shall be using the popular polynomial kernel
Re-balancing of the Data -Sample Balancing • a • a • a 0:1
Re-balancing of the Data -Weight Balancing • a • The case of “balanced proportions” achieved for B= 0. B= +1 representing the case of learning from positive examples only. Similarly, learning from negative class only is achieved for B= -1. is a parameter called a balance factor
Experiments- Real World Data Collections • AHR-data set used for task 2 of KDD Cup 2002 • 芳香巠基碳水化合物的資料集 • for cancer research • three class: change, control, nc • Reuters data • 12902 documents
Performance Measures • We have used AROC, the Area under the Receiver Operating Characteristic (ROC) curve as our main performance measure. • The trivial uniform random predictor has AROC of 0.5, while a perfect predictor has an AROC of 1. Xi from the negative class Xj from the positive class
Experiments with Real World Data • The sizes of the data split training:test were • 50%:50% for the Reuters data • 70%:30% for the AHR-data
Impact of Regularization Constant positive 1-calss – – – – – – – balanced 2-class –‧–‧–‧– un-balanced 2-class …………… negative 1-class
Impact of feature selection (1/2) • feature selection methods: • DocFreq (Document frequency thresholding): 1 • ChiSqua(χ2): The measures the lack of independence between a feature and a class of interest. • MutInfo (Mutual Information) • InfGain (Information gain): term goodness measure • We have used all of the minority cases and sampled the majority cases at different mixture ratios (MajorityOnly sample balancing).
Experiments with Weight Balancing • In order to understand if the impact of negative examples may be reduced using the balance factor B in Equation (4) • Tests on AHR data • Tests on Reuters
Tests on AHR data • B= 0 : balanced 2-class • B= +1 : positive 1-class • B= -1 : negative 1-class
Tests on Reuters ------- balanced 2-class positive 1-class
Experiments with Synthetic Data • S1: ninf=1; nnoise=999 • S2: ninf=10; nnoise=990 • S3 : ninf=1; nnoise=19 polynomial kernels : linear kernel polynomial kernels: non-linear kernel two
Conclusion • The Reuters dataset • provides quite good results but using both classes always produces better results • The AHR data set • The positive one-class learners performing significantly better than two-class learners. • One-class learning from positive class examples can be a very robust classification technique when dealing with very unbalanced data and high dimensional noisy feature space.
Personal Opinion • Strength • many experiments • Weakness • equations are not clear • Application • SVM • document classification • Image retrieval