250 likes | 372 Views
A fast ensemble pruning algorithm based on pattern mining process. 17 July 2009 Springer Science+Business Media, LLC 2009. 69821514 洪佳瑜 69821516 蔣克欽. Outline. M otive Introduction Method Experiment Conclusion. M otive.
E N D
A fast ensemble pruning algorithm based on pattern mining process 17 July 2009 Springer Science+Business Media, LLC 2009 69821514洪佳瑜 69821516蔣克欽
Outline • Motive • Introduction • Method • Experiment • Conclusion
Motive • most ensemble pruning methods in the literature need much pruning time,and are mainly used to the domains where time can be sacrificed in order to improveaccuracy. This makes them unsuitable for the applications requiring fast learning process,such as on-line network intrusiondetection.
Introduction • pattern mining based ensemble pruning (PMEP) • The algorithm converts an ensemble pruning problem into aspecial pattern mining problem,which enables a FP-Tree to store the prediction resultsof all base classifiers, then uses a new pattern mining method to select base classifiers. • The final output of our PMEP approach is the pruned ensemblewith the highest correctvalue.
Properties of PMEP (1/2) • Firstly, it uses a transaction database to represent the prediction results of all base classifiers. This representation enables a FP-Tree to compact the results, and the ensemble pruning process becomes to pattern mining problems. • Secondly, PMEP uses majority voting principle to decide the candidate classifiers before pattern mining process. For a given k, PMEP only considers the paths with length of [k/2 + 1] in the FP-Tree.
Properties of PMEP (2/2) • Thirdly, the pattern mining method greedily selects a set of classifiers, instead of one in each iteration, which saves pruning time further.
Method (2/7) For any i (1 ≤ i ≤ n), if we have:Li = L 或 Li =0 we delete their corresponding rows from table T to reduce computational cost.
Method (4/7) • suppose k = 5, we have:
Method (5/7) • the largest count value, and its classifier set is {h2, h5, h6}.We addthese three classifiers into S.set, and set S.correct=3. • S.set={ h2, h5, h6}, S.correct=3. • Then delete
Method (6/7) • the first row has the maximum count value, so the base classifier h7 is selected. • S.set={ h2, h5, h6, h7}, S.correct=7.
Method (7/7) • the classifier sets {h1} and {h4} have the same count value. Consideringthat the path of h1 is constructed earlier in Path-Table than that of h4, we add h1into S.set. • S.set={h2, h5, h6, h7, h1}, andS. correct=8
advantages • the classifiers with negative effect for ensemble have low probability to be selected because of low count values • the selected classifiers come from multiple paths, which makes them have low error correlation.
Experiment We compared the performance of ourapproach, PMEP, against Bagging (Breiman 1996), GASEN (Zhou et al. 2002), and Forward Selection (FS) (Caruana et al. 2004)in our empirical study. Test platform: AMD 4000+, 2G RAM C programming language Linux operating system
All the tests are performed on 15 data sets from UCI machine leaning repository
Sizes of pruned ensembles for each data set, the last one is the average result of all 15 data sets Avg :20 7.43 3.775.70
Conclusion • The experimental results have shown that the proposed PMEP achieves the highest prediction accuracy, and costs much less pruning time than GASEN and forward selection. • The design of our PMEP algorithm is aimed at majority voting method, how to extend the algorithm to other combination strategies is the other of our works.