Cost- sensitive boosting for classification of imbalanced data

Cost- sensitive boosting for classification of imbalanced data Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors : Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang 2007.PR.21

Outline • Motivation • Objective • Methodology • AdaBoost • Cost-sensitive boosting algorithms • Experiment • Conclusion • Comments

Motivation • Standard classifiers are designed to generalize from training data and output the simplest hypothesis that best fits the data. • The simplest hypothesis pays less attention to rare cases in an imbalanced data set. • AdaBoost is an accuracy-oriented algorithm, its learning strategy may bias towards the prevalent class as it contributes more to the overall classification accuracy.

Objective • The AdaBoost algorithm is adapted for advancing the classification of imbalanced data. • The authors propose three cost-sensitive boosting algorithms which are introduced cost items into the learning framework of AdaBoost.

Methodology Man?Woman? D1(i) D2(i) D3(i) Dt(i) α2 αt-1 α1 α3 … h1 h2 h3 ht woman man man man H

Methodology • AdaBoost algorithm

Methodology • Cost-sensitive boosting algorithms : Costsetups :

Experiment • Dataset • The authors use four medical diagnosis data sets taken from the UCI Machine Learning Database. • These four data sets are: Breast cancer data (Cancer), Hepatits data (Hepatits), Pima Indian’s diabetes database (Pima), and Sick-euthyroid data (Sick). • All data sets have two output labels: one denotes the diseasecategory which is treated as the positive class, and another represents the normal category. • Base classifier • C4.5 • HPWR

Experiment

Conclusion • The authors investigate cost-sensitive boosting algorithm for advancing the classification of imbalanced data. • Experimental results indicate that AdaC2 is superior to its rivals. • Some research issues are open for future investigation • To fix cost factors using some more efficient methods. • To explore their effectiveness in any other specific domains. • To integrating cost values into the framework of RealBoost and to develop cost-sensitive boosting algorithms.

Comments • Advantage • … • Drawback • … • Application • Classification of imbalanced data

Cost- sensitive boosting for classification of imbalanced data

Cost- sensitive boosting for classification of imbalanced data

Presentation Transcript

Inductive Learning from Imbalanced Data Sets

Cost-Sensitive Classifier Evaluation

Managing Sensitive Data

Data Annotation for Classification

imbalanced data

Services for Sensitive Research Data

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products

Active Learning for Imbalanced Sentiment Classification

Cost classification

Ensembles for Cost-Sensitive Learning

A Boosting Algorithm for Classification of Semi-Structured Text

Test-Cost Sensitive Naïve Bayes Classification

Managing sensitive data

Investigating the Effect of Sampling Methods for Imbalanced Data Distributions

Inductive Learning from Imbalanced Data Sets

imbalanced data

Feature selection for text categorization on imbalanced data

Analysis of Imbalanced Classification Algorithms A Perspective View

Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm Optimization