1 / 11

Cost- sensitive boosting for classification of imbalanced data

Cost- sensitive boosting for classification of imbalanced data. Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors : Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang. Outline. Motivation Objective Methodology AdaBoost Cost-sensitive boosting algorithms Experiment Conclusion

aspen
Download Presentation

Cost- sensitive boosting for classification of imbalanced data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost- sensitive boosting for classification of imbalanced data Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors : Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang 2007.PR.21

  2. Outline • Motivation • Objective • Methodology • AdaBoost • Cost-sensitive boosting algorithms • Experiment • Conclusion • Comments

  3. Motivation • Standard classifiers are designed to generalize from training data and output the simplest hypothesis that best fits the data. • The simplest hypothesis pays less attention to rare cases in an imbalanced data set. • AdaBoost is an accuracy-oriented algorithm, its learning strategy may bias towards the prevalent class as it contributes more to the overall classification accuracy.

  4. Objective • The AdaBoost algorithm is adapted for advancing the classification of imbalanced data. • The authors propose three cost-sensitive boosting algorithms which are introduced cost items into the learning framework of AdaBoost.

  5. Methodology Man?Woman? D1(i) D2(i) D3(i) Dt(i) α2 αt-1 α1 α3 … h1 h2 h3 ht woman man man man H

  6. Methodology • AdaBoost algorithm

  7. Methodology • Cost-sensitive boosting algorithms : Costsetups :

  8. Experiment • Dataset • The authors use four medical diagnosis data sets taken from the UCI Machine Learning Database. • These four data sets are: Breast cancer data (Cancer), Hepatits data (Hepatits), Pima Indian’s diabetes database (Pima), and Sick-euthyroid data (Sick). • All data sets have two output labels: one denotes the diseasecategory which is treated as the positive class, and another represents the normal category. • Base classifier • C4.5 • HPWR

  9. Experiment

  10. Conclusion • The authors investigate cost-sensitive boosting algorithm for advancing the classification of imbalanced data. • Experimental results indicate that AdaC2 is superior to its rivals. • Some research issues are open for future investigation • To fix cost factors using some more efficient methods. • To explore their effectiveness in any other specific domains. • To integrating cost values into the framework of RealBoost and to develop cost-sensitive boosting algorithms.

  11. Comments • Advantage • … • Drawback • … • Application • Classification of imbalanced data

More Related