110 likes | 461 Views
Cost- sensitive boosting for classification of imbalanced data. Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors : Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang. Outline. Motivation Objective Methodology AdaBoost Cost-sensitive boosting algorithms Experiment Conclusion
E N D
Cost- sensitive boosting for classification of imbalanced data Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors : Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang 2007.PR.21
Outline • Motivation • Objective • Methodology • AdaBoost • Cost-sensitive boosting algorithms • Experiment • Conclusion • Comments
Motivation • Standard classifiers are designed to generalize from training data and output the simplest hypothesis that best fits the data. • The simplest hypothesis pays less attention to rare cases in an imbalanced data set. • AdaBoost is an accuracy-oriented algorithm, its learning strategy may bias towards the prevalent class as it contributes more to the overall classification accuracy.
Objective • The AdaBoost algorithm is adapted for advancing the classification of imbalanced data. • The authors propose three cost-sensitive boosting algorithms which are introduced cost items into the learning framework of AdaBoost.
Methodology Man?Woman? D1(i) D2(i) D3(i) Dt(i) α2 αt-1 α1 α3 … h1 h2 h3 ht woman man man man H
Methodology • AdaBoost algorithm
Methodology • Cost-sensitive boosting algorithms : Costsetups :
Experiment • Dataset • The authors use four medical diagnosis data sets taken from the UCI Machine Learning Database. • These four data sets are: Breast cancer data (Cancer), Hepatits data (Hepatits), Pima Indian’s diabetes database (Pima), and Sick-euthyroid data (Sick). • All data sets have two output labels: one denotes the diseasecategory which is treated as the positive class, and another represents the normal category. • Base classifier • C4.5 • HPWR
Conclusion • The authors investigate cost-sensitive boosting algorithm for advancing the classification of imbalanced data. • Experimental results indicate that AdaC2 is superior to its rivals. • Some research issues are open for future investigation • To fix cost factors using some more efficient methods. • To explore their effectiveness in any other specific domains. • To integrating cost values into the framework of RealBoost and to develop cost-sensitive boosting algorithms.
Comments • Advantage • … • Drawback • … • Application • Classification of imbalanced data