120 likes | 278 Views
An Empirical Study of Learning from Imbalanced Data Using R andom F orest. Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse. 2007 . ICTAI . Page : 310 - 317. Outline. Motivation Objective Method Experiment
E N D
An Empirical Study of Learning from Imbalanced Data Using Random Forest Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse 2007 . ICTAI . Page : 310 - 317
Outline • Motivation • Objective • Method • Experiment • Experimental Result • Conclusion • Comments
A forest Motivation A tree
… Motivation • RF is a relatively new learner, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. What should be the recommended default number of trees in the ensemble? Whatshould the recommended value be for thenumber ofattributes? How does the RF learner perform onimbalanceddata when compared with othercommonly-used learners? NB, SVM, KNN, C4.5, etc. 4
Objective • This work, is the first to conduct comprehensive experimentation with the RF learner in Weka and recommend empirically proven default values for the numTrees and numFeatures parameters.
Method ─ RF Dataset : … 取後放回 取後放回 1 2 1 4 2 5 3 6
Metrics: The area under the ROC curve (AUC) The Kolmogorov-Smirnov (KS) Method ─ Experimental Datasets
numTrees numFeatures Experimental ResultsPhase 1: Selecting an Appropriate RF Learner
Good ! Good ! Experimental ResultsPhase 2: Comparison of RF-100 to Other Learners
Conclusion • The contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. • The parameters for the RF learners were chosen to ensure good performance in many different circumstances and to be reasonable for the imbalanced datasets.
Comments • Advantage • Building many learners in these experiments let me believe in the reliability of their experimental results. • Drawback • Due to space restrictions many experiments results are not included here. • Application • Handling imbalanced data