Boosting Neural Networks

Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8):1869-1887, 2000. Presented by Yong Li

Outline • Introduction • AdaBoost • 3 versions of AdaBoost for Neural Network • Results • Conclusions • Discussions

Introduction • Boosting – a general method to improve the performance of a learning method. • AdaBoost is a relatively new one of Boosting algorithms. • Many empirical studies for AdaBoost using decision tree as base classifiers. (Breiman 1996, Drucker and cortes, 1996, et al) • Also theoretically understanding. (Schapire et al 1997, Breidman 1998, Schapire 1999)

Introduction • But applications have all been to decision trees. No applications to multi-layer artificial neural networks. (At that time) • The questions which this paper try to answer • Does AdaBoost work as well for neural networks as for decision tree? • Does it behave in a similar way? • And more?

AdaBoost (Adaptive Boosting) • It is often possible to increase the accuracy of a classifier by averaging the decisions of an ensemble of classifiers. • Two popular ensemble methods. Bagging and Boosting. • Bagging improves generation performance due to a reduction in variance while maintaining or only slightly increasing bias. • AdaBoost constructs a composite classifier by sequentially training classifier while putting more and more emphasis on certain patterns.

AdaBoost • AdaBoost M2 is used in the experiments

Applying AdaBoost to neural networks • Three versions of AdaBoost are compared in this paper. • (R) Training the t-th classifier with a fixed training set • (E) Training the t-th classifier using a different training set at each epoch • (W) Training the t-th calssifier by directly weighting the cost function of the t-th neural network.

Results • Experiments are performed on three data sets. • The online data set collected at Paris 6 university • 22 attributes([-1 1]22), 10 classes. • 1200 examples for learning and 830 examples for testing • UCI letter • 16 attributes and 26 classes • 16000 for training and 4000 for testing • Satimage Data set • 36 attributes and 6 classes • 4435 for training and 2000 for testing

Results of online data

Results of online data • Some conclusions • Boosting is better than Bagging • AdaBoost is less useful for very big networks. • (E) and (W) versions are better than (R)

Results of online data • The generation errors continue decrease after the training error reach zero.

Results of online data The number of examples with high margin increases when more classifier are combined by boosting Note: There are opposite results about the margin cumulative distribution.

Results of online data Bagging has no significant influence on the margin distribution

The results for UCI letters and Satimage data sets • Only E and W version are applied. They obtain same results. • The same conclusions are drawn as those of online data. (Some results are omitted)

Conclusion • AdaBoost can significantly improve the neural classifiers. • Does AdaBoost work as well for neural networks as for decision tree? • Answer Yes • Does it behave in a similar way? • Answer Yes • Overfitting • Still there • Other questions • Short answers

Discussions • Empirically shows AdaBoost works well for neural networks • The algorithm description is misleading. • Dt(i), Dt(i, y)

Boosting Neural Networks

Boosting Neural Networks

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks