400 likes | 611 Views
From Evolutionary Computation to Ensemble Learning. Xin Yao CERCIA, School of Computer Science University of Birmingham UK. Overview. Introduction (Evolutionary Computation) Multi-objective learning and ensembles Online learning with concept drifts Class imbalance learning
E N D
From Evolutionary Computation to Ensemble Learning Xin Yao CERCIA, School of Computer Science University of Birmingham UK
Overview • Introduction (Evolutionary Computation) • Multi-objective learning and ensembles • Online learning with concept drifts • Class imbalance learning • Concluding remarks
Why Evolution? • Learning and evolution are two fundamental forms of adaptation. It is interesting to study both, especially the integration of the two. • Simulated evolution makes few assumptions of what’s being evolved. It can be introduced into an ANN at different levels, including weight training, architecture adaptation and learning rule adaptation. • X. Yao, “Evolving artificial neural networks,” Proceedings of the IEEE, 87(9):1423-1447, September 1999.
What Is Evolutionary Computation? • It is the study of computational systems that use ideas and draw inspirations from natural evolution. • For example, one of the often used inspiration is survival of the fittest. • Evolutionary computation (EC) can be used in optimisation, machine learning and creative design. • There has been a significant growth in EC theories in recent years, especially in computational time complexity analysis.
Evolutionary Learning and Optimisation • Learning has often (always?) been formulated as an optimisation problem. • However, learning is different from optimisation.
Populations as Ensembles • Keep every member in the population and form an ensemble output from them as the final solution. • Side-effect: We don’t need to choose the `best’ individual anymore. Everyone has a role to play in the population. • X. Yao, Y. Liu and P. Darwen, ``How to make best use of evolutionary learning,'' Complexity International: An Electronic Journal of Complex Systems Research (ISSN 1320-0682), Vol. 3, July 1996.
An Early Work • Playing the two-player iterated prisoner’s dilemma (2IPD) game. • An evolutionary algorithm was used to evolve strategies for playing the 2IPD. • Implicit fitness sharing was used to form different species (specialists) in a population. • A gating algorithm was used to combine individuals in a population together. • P. J. Darwen and X. Yao, “Speciation as automatic categorical modularization,” IEEE Transactions on Evolutionary Computation, 1(2):101-108, 1997.
What Make It Work? • It turns out that the population and diversity (induced by speciation) are essential, not evolution. • OK, so what?
Population + Diversity - Evolution = Negative Correlation Learning Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, 12(10):1399-1404, December 1999.
Overview • Introduction (Evolutionary Computation) • Multi-objective learning and ensembles • Online learning with concept drifts • Class imbalance learning • Concluding remarks
Where Do Multi-objectives Come From? We want accurate and diverse ensembles, e.g., in negative correlation learning: They are two objectives! Why do we have to introduce a hyper-parameter and then struggle to optimise it?
Multi-objective Learning • Multi-objective learning treats accuracy and diversity asseparate objectives in learning. • Multi-objective optimisation algorithms, such as multi-objective evolutionary algorithms (MOEAs), are used as learning algorithms. • The result from such an MOEA is a non-dominatedset of solutions (i.e., learners), which ideally form the ensemble we are interested. • A Chandra and X. Yao, ``Ensemble learning using multi-objective evolutionary algorithms,'' Journal of Mathematical Modelling and Algorithms, 5(4):417-445, December 2006.
Flexibility and Generality • Multi-objective learning offers a highly flexible and general framework for considering different requirements in learning. • For example, we can include an additional regularisation term, as an additional objective, in learning easily: • H. Chen and X. Yao, ``Multiobjective Neural Network Ensembles based on Regularized Negative Correlation Learning,'' IEEE Transactions on Knowledge and Data Engineering, 22(12):1738-1751, December 2010. • Widely applicable: • L. L. Minku and X. Yao, ``Software Effort Estimation as a Multi-objective Learning Problem,'' ACM Transactions on Software Engineering and Methodology, to appear. • Z. Wang, K. Tang and X. Yao, ``Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems,'' IEEE Transactions on Reliability, 59(3):563-575, September 2010.
Overview • Introduction (Evolutionary Computation) • Multi-objective learning and ensembles • Online learning with concept drifts • Class imbalance learning • Concluding remarks
Introduction to Online Learning • Online learning: process each training example once ``on arrival‘’, without the need for storage or reprocessing. • The underlying distribution of the problem can change with time (concept drift). • Related to incremental learning. • A growing number of applications operate in such a way that new data are available with time.
Related Work • Traditional online learning approaches have difficulties in adapting quickly to changes. • E.g. Online Bagging: • N. C. Oza and S. Russell, “Experimental comparisons of online and batch versions of bagging and boosting,” in Proc. of ACM SIGKDD, 2001. • Approaches that explicitly or implicitly detect concept drifts have been proposed. • E.g. Early Drift Detection Method (EDDM) • M. Baena-Garc´ıa, J. Del Campo´Avila, R. Fidalgo, and A. Bifet, “Early drift detection method,” in Proc. Of ECML PKDD (IWKDDS), 2006 • and Dynamic Weighted Majority (DWM) • J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” JMLR, 2007. • Ensembles of learning machines have been used to deal with drifts, e.g. DWM. • None of the existing work has analysed the role of diversity and its impact on ensemble learning.
Diversity in Online Ensemble Learning • A recent study analysedthe effect of different levels of diversity in ensembles before and after a drift. • L. L. Minku, A. White and X. Yao, ``The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift,'' IEEE Transactions on Knowledge and Data Engineering, 22(5):730-742, May 2010. • Main findings: • Lower diversity provided very good accuracy before the drift. • Higher diversity had low accuracy before the drift, but helped the ensemble to be less affected by the drift. • However, the higher level of diversity makes the learning of the new concept difficult.
How Could Such Insight Be Used? • How to make use of different levels of diversity so as to improve performance when there is a concept drift? • Would that allow for better performance than a new ensemble created from scratch? • Would such an approach achieve competitive performance in the absence of drifts? L. L. Minku and X. Yao, "DDD: A New Ensemble Approach For Dealing With Concept Drift,'' IEEE Transactions on Knowledge and Data Engineering, 24(4):619-633, April 2012.
A Simple Idea • Maintain two ensembles with low and high diversity respectively. • Enforce low diversity in the old high diversity ensemble after a drift occurs.
DDD: Diversity for Dealing with Drifts • Before a drift: Low diversity and high diversity ensembles. • After a drift: Old low diversity and old high diversity ensembles learning with low diversity. • New low and high diversity ensembles. • Dynamic weights based on accuracy since drift detection are used to determine what ensembles are used for predictions. • New high diversity ensembles are not used for predictions.
Summary for Online Ensemble Learning • Different levels of diversity are good for different types of drift at different times. • DDD is successful in using different levels of diversity at different times, obtaining competitive performance for concepts drifts as well as non-drifts (false alarms).
Overview • Introduction (Evolutionary Computation) • Multi-objective learning and ensembles • Online learning with concept drifts • Class imbalance learning • Concluding remarks
Class Imbalance • Many real-world applications have very unbalanced distributions among classes: • E.g. fault diagnosis, software defect prediction, etc. • Minority class: rare cases, high misclassification cost. • Quantifying costs is hard in practice.
Class Imbalance Learning • Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority). • Learning difficulty: • poor generalization on the minority class. • Learning objective: • obtaining a classifier that will provide high accuracy for the minority class without severely jeopardizing the accuracy of the majority class.
Some Related Work • Re-sampling techniques: change the number of training data. • Over-sampling the minority class • Under-sampling the majority class • Cost-sensitive methods: increase the misclassification cost of the minority class. • Hard to quantify costs in practice • Classification ensembles: • Combine multiple learners to improve performance; • Advantages: independent of base learning algorithms; improved generalization.
Diversity in Class Imbalance Ensemble Learning • Strong correlations are found between diversity and generalisation performance measures: • Diversity showed a positive impact on the minority class, which is achieved by making the ensemble produce less over-fitting classification boundaries for the minority class; • Diversity was shown to be beneficial to both AUC and G-mean (overall performance). • S. Wang and X. Yao, ``Relationships Between Diversity of Classification Ensembles and Single-Class Performance Measures,'' IEEE Transactions on Knowledge and Data Engineering, 25(1):206-219, January 2013.
Making Use of Diversity: AdaBoost.NC • Apply random oversampling to rectify the imbalanced distribution first. • Encourage diversity: introduce diversity information (amb) into the weights of training examples in the sequential training procedure of AdaBoost. • The weight-updating rule of AdaBoost is modified incrementally, such that both high classification errors and low diversity will be penalised. • S. Wang, H. Chen and X. Yao, “Negative correlation learning for classification ensembles”. Proc. of IJCNN’10, pp.2893-2900. IEEE Press, 2010.
AdaBoost.NC: Result Summary • Advantages of using AdaBoost.NC in class imbalance learning are: • no data information is lost; • no data generation method is needed in training; • It reduces the dependence of the algorithm on re-sampling techniques and training data. • Findings: • AdaBoost.NC integrated with random over-sampling is effective in classifying minority class examples correctly without losing the overall performance, compared to other methods. • The over-sampling level and imbalance rate of training data were shown not to be crucial factors in influencing the effectiveness. • AdaBoost.NC produced very promising generalization results in terms of both AUC and minority-class performance.
Multi-class Imbalance Learning • Multi-class imbalance: there are more than two classes with uneven class distributions. • E.g. In software defect prediction: there are different types of defects. • Most existing imbalance learning techniques are only designed for and tested in two-class scenarios. • Existing methods are not effective or even cause a negative effect when there is more than one minority/majority class. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.
Existing Work • Use class decomposition: Converting a multi-class problem into a set of two-class sub-problems; then use two-class imbalance techniques to handle each obtained binary sub-task. • Class decomposition schemes include (given a c-class task, c > 2): • one-against-all (OAA): Each of the c classes is trained against all other classes. It results in c binary classifiers, making data more imbalanced. • one-against-one (OAO): Each of the c classes is trained against every one of the other classes. It results in c(c-1)/2 binary classifiers. When c is large, the training time can be very long. • P-against-Q (PAQ): Using P of the c classes against the other Q of the c classes, the training process is repeated several times. Different P classes are chosen at each time. • Few work treated multi-class imbalance problems as multi-class.
New Challenges from Multi-class • Find out new challenges in multi-class imbalance by studying two types of multi-class: multi-minority and multi-majority. • Are there any differences between multiple minority and multiple majority classes? • Would these two types of problem pose the same or different challenges to a learning algorithm? • For such multi-class imbalance problems, which aspects of a problem would be affected the most by the multi-class? Would it be a minority class, a majority class, or both? • Develop a simple and effective ensemble learning method without using class decomposition: • Can AdaBoost.NC be extended to tackle multi-class imbalance directly? • How does it perform compared to the methods based on class decomposition? • Is class decomposition necessary for multi-class problems?
Main Findings • Both multi-minority and multi-majority negatively affect the overall and minority-class performance. Particularly, the multi-majority case tends to be more harmful, in terms of F-measure and recall. • Neither oversampling nor under-sampling is satisfactory: • random oversampling suffers from over-fitting as no new information is introduced into the minority class to facilitate the classification; • the effect of random under-sampling is weakened when there are more minority classes. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.
AdaBoost.NC for Multi-Class Imbalance Problems • AdaBoost.NC can better recognize minority class examples and better balances the performance across multiple classes with high G-mean. • Using class decomposition is unnecessary to tackle multi-class imbalance problems. • The proposed combination method for OAA (with weights based on imbalance rates) improves the performance of the original OAA. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.
Concluding Remarks • Ensembles are competitive learning methods to solve different problems. • Diversity is the key issue in ensemble learning. • We have offered the first studies into ensemble diversity for online learning and class imbalance learning, especially for multi-class imbalance. • Insight into diversity’s roles enables us to design better ensemble algorithms • We need more theoretical analysis of the algorithms.