From Evolutionary Computation to Ensemble Learning

From Evolutionary Computation to Ensemble Learning Xin Yao CERCIA, School of Computer Science University of Birmingham UK

Overview • Introduction (Evolutionary Computation) • Multi-objective learning and ensembles • Online learning with concept drifts • Class imbalance learning • Concluding remarks

Why Evolution? • Learning and evolution are two fundamental forms of adaptation. It is interesting to study both, especially the integration of the two. • Simulated evolution makes few assumptions of what’s being evolved. It can be introduced into an ANN at different levels, including weight training, architecture adaptation and learning rule adaptation. • X. Yao, “Evolving artificial neural networks,” Proceedings of the IEEE, 87(9):1423-1447, September 1999.

What Is Evolutionary Computation? • It is the study of computational systems that use ideas and draw inspirations from natural evolution. • For example, one of the often used inspiration is survival of the fittest. • Evolutionary computation (EC) can be used in optimisation, machine learning and creative design. • There has been a significant growth in EC theories in recent years, especially in computational time complexity analysis.

Current Practice in Evolutionary Learning

Fitness Evaluation

Evolutionary Learning and Optimisation • Learning has often (always?) been formulated as an optimisation problem. • However, learning is different from optimisation.

Populations as Ensembles • Keep every member in the population and form an ensemble output from them as the final solution. • Side-effect: We don’t need to choose the `best’ individual anymore. Everyone has a role to play in the population. • X. Yao, Y. Liu and P. Darwen, ``How to make best use of evolutionary learning,'' Complexity International: An Electronic Journal of Complex Systems Research (ISSN 1320-0682), Vol. 3, July 1996.

An Early Work • Playing the two-player iterated prisoner’s dilemma (2IPD) game. • An evolutionary algorithm was used to evolve strategies for playing the 2IPD. • Implicit fitness sharing was used to form different species (specialists) in a population. • A gating algorithm was used to combine individuals in a population together. • P. J. Darwen and X. Yao, “Speciation as automatic categorical modularization,” IEEE Transactions on Evolutionary Computation, 1(2):101-108, 1997.

Experimental Results

What Make It Work? • It turns out that the population and diversity (induced by speciation) are essential, not evolution. • OK, so what?

Population + Diversity - Evolution = Negative Correlation Learning Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, 12(10):1399-1404, December 1999.

Where Do Multi-objectives Come From? We want accurate and diverse ensembles, e.g., in negative correlation learning: They are two objectives! Why do we have to introduce a hyper-parameter and then struggle to optimise it?

Multi-objective Learning • Multi-objective learning treats accuracy and diversity asseparate objectives in learning. • Multi-objective optimisation algorithms, such as multi-objective evolutionary algorithms (MOEAs), are used as learning algorithms. • The result from such an MOEA is a non-dominatedset of solutions (i.e., learners), which ideally form the ensemble we are interested. • A Chandra and X. Yao, ``Ensemble learning using multi-objective evolutionary algorithms,'' Journal of Mathematical Modelling and Algorithms, 5(4):417-445, December 2006.

Flexibility and Generality • Multi-objective learning offers a highly flexible and general framework for considering different requirements in learning. • For example, we can include an additional regularisation term, as an additional objective, in learning easily: • H. Chen and X. Yao, ``Multiobjective Neural Network Ensembles based on Regularized Negative Correlation Learning,'' IEEE Transactions on Knowledge and Data Engineering, 22(12):1738-1751, December 2010. • Widely applicable: • L. L. Minku and X. Yao, ``Software Effort Estimation as a Multi-objective Learning Problem,'' ACM Transactions on Software Engineering and Methodology, to appear. • Z. Wang, K. Tang and X. Yao, ``Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems,'' IEEE Transactions on Reliability, 59(3):563-575, September 2010.

Introduction to Online Learning • Online learning: process each training example once ``on arrival‘’, without the need for storage or reprocessing. • The underlying distribution of the problem can change with time (concept drift). • Related to incremental learning. • A growing number of applications operate in such a way that new data are available with time.

Related Work • Traditional online learning approaches have difficulties in adapting quickly to changes. • E.g. Online Bagging: • N. C. Oza and S. Russell, “Experimental comparisons of online and batch versions of bagging and boosting,” in Proc. of ACM SIGKDD, 2001. • Approaches that explicitly or implicitly detect concept drifts have been proposed. • E.g. Early Drift Detection Method (EDDM) • M. Baena-Garc´ıa, J. Del Campo´Avila, R. Fidalgo, and A. Bifet, “Early drift detection method,” in Proc. Of ECML PKDD (IWKDDS), 2006 • and Dynamic Weighted Majority (DWM) • J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” JMLR, 2007. • Ensembles of learning machines have been used to deal with drifts, e.g. DWM. • None of the existing work has analysed the role of diversity and its impact on ensemble learning.

Diversity in Online Ensemble Learning • A recent study analysedthe effect of different levels of diversity in ensembles before and after a drift. • L. L. Minku, A. White and X. Yao, ``The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift,'' IEEE Transactions on Knowledge and Data Engineering, 22(5):730-742, May 2010. • Main findings: • Lower diversity provided very good accuracy before the drift. • Higher diversity had low accuracy before the drift, but helped the ensemble to be less affected by the drift. • However, the higher level of diversity makes the learning of the new concept difficult.

Experimental Study: A Snapshot

How Could Such Insight Be Used? • How to make use of different levels of diversity so as to improve performance when there is a concept drift? • Would that allow for better performance than a new ensemble created from scratch? • Would such an approach achieve competitive performance in the absence of drifts? L. L. Minku and X. Yao, "DDD: A New Ensemble Approach For Dealing With Concept Drift,'' IEEE Transactions on Knowledge and Data Engineering, 24(4):619-633, April 2012.

A Simple Idea • Maintain two ensembles with low and high diversity respectively. • Enforce low diversity in the old high diversity ensemble after a drift occurs.

DDD: Diversity for Dealing with Drifts • Before a drift: Low diversity and high diversity ensembles. • After a drift: Old low diversity and old high diversity ensembles learning with low diversity. • New low and high diversity ensembles. • Dynamic weights based on accuracy since drift detection are used to determine what ensembles are used for predictions. • New high diversity ensembles are not used for predictions.

An Example Result

Summary for Online Ensemble Learning • Different levels of diversity are good for different types of drift at different times. • DDD is successful in using different levels of diversity at different times, obtaining competitive performance for concepts drifts as well as non-drifts (false alarms).

Class Imbalance • Many real-world applications have very unbalanced distributions among classes: • E.g. fault diagnosis, software defect prediction, etc. • Minority class: rare cases, high misclassification cost. • Quantifying costs is hard in practice.

Class Imbalance Learning • Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority). • Learning difficulty: • poor generalization on the minority class. • Learning objective: • obtaining a classifier that will provide high accuracy for the minority class without severely jeopardizing the accuracy of the majority class.

Some Related Work • Re-sampling techniques: change the number of training data. • Over-sampling the minority class • Under-sampling the majority class • Cost-sensitive methods: increase the misclassification cost of the minority class. • Hard to quantify costs in practice • Classification ensembles: • Combine multiple learners to improve performance; • Advantages: independent of base learning algorithms; improved generalization.

Diversity in Class Imbalance Ensemble Learning • Strong correlations are found between diversity and generalisation performance measures: • Diversity showed a positive impact on the minority class, which is achieved by making the ensemble produce less over-fitting classification boundaries for the minority class; • Diversity was shown to be beneficial to both AUC and G-mean (overall performance). • S. Wang and X. Yao, ``Relationships Between Diversity of Classification Ensembles and Single-Class Performance Measures,'' IEEE Transactions on Knowledge and Data Engineering, 25(1):206-219, January 2013.

Making Use of Diversity: AdaBoost.NC • Apply random oversampling to rectify the imbalanced distribution first. • Encourage diversity: introduce diversity information (amb) into the weights of training examples in the sequential training procedure of AdaBoost. • The weight-updating rule of AdaBoost is modified incrementally, such that both high classification errors and low diversity will be penalised. • S. Wang, H. Chen and X. Yao, “Negative correlation learning for classification ensembles”. Proc. of IJCNN’10, pp.2893-2900. IEEE Press, 2010.

AdaBoost.NC: Result Summary • Advantages of using AdaBoost.NC in class imbalance learning are: • no data information is lost; • no data generation method is needed in training; • It reduces the dependence of the algorithm on re-sampling techniques and training data. • Findings: • AdaBoost.NC integrated with random over-sampling is effective in classifying minority class examples correctly without losing the overall performance, compared to other methods. • The over-sampling level and imbalance rate of training data were shown not to be crucial factors in influencing the effectiveness. • AdaBoost.NC produced very promising generalization results in terms of both AUC and minority-class performance.

Multi-class Imbalance Learning • Multi-class imbalance: there are more than two classes with uneven class distributions. • E.g. In software defect prediction: there are different types of defects. • Most existing imbalance learning techniques are only designed for and tested in two-class scenarios. • Existing methods are not effective or even cause a negative effect when there is more than one minority/majority class. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.

Existing Work • Use class decomposition: Converting a multi-class problem into a set of two-class sub-problems; then use two-class imbalance techniques to handle each obtained binary sub-task. • Class decomposition schemes include (given a c-class task, c > 2): • one-against-all (OAA): Each of the c classes is trained against all other classes. It results in c binary classifiers, making data more imbalanced. • one-against-one (OAO): Each of the c classes is trained against every one of the other classes. It results in c(c-1)/2 binary classifiers. When c is large, the training time can be very long. • P-against-Q (PAQ): Using P of the c classes against the other Q of the c classes, the training process is repeated several times. Different P classes are chosen at each time. • Few work treated multi-class imbalance problems as multi-class.

New Challenges from Multi-class • Find out new challenges in multi-class imbalance by studying two types of multi-class: multi-minority and multi-majority. • Are there any differences between multiple minority and multiple majority classes? • Would these two types of problem pose the same or different challenges to a learning algorithm? • For such multi-class imbalance problems, which aspects of a problem would be affected the most by the multi-class? Would it be a minority class, a majority class, or both? • Develop a simple and effective ensemble learning method without using class decomposition: • Can AdaBoost.NC be extended to tackle multi-class imbalance directly? • How does it perform compared to the methods based on class decomposition? • Is class decomposition necessary for multi-class problems?

Main Findings • Both multi-minority and multi-majority negatively affect the overall and minority-class performance. Particularly, the multi-majority case tends to be more harmful, in terms of F-measure and recall. • Neither oversampling nor under-sampling is satisfactory: • random oversampling suffers from over-fitting as no new information is introduced into the minority class to facilitate the classification; • the effect of random under-sampling is weakened when there are more minority classes. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.

AdaBoost.NC for Multi-Class Imbalance Problems • AdaBoost.NC can better recognize minority class examples and better balances the performance across multiple classes with high G-mean. • Using class decomposition is unnecessary to tackle multi-class imbalance problems. • The proposed combination method for OAA (with weights based on imbalance rates) improves the performance of the original OAA. • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B, 42(4):1119-1130, August 2012.

Concluding Remarks • Ensembles are competitive learning methods to solve different problems. • Diversity is the key issue in ensemble learning. • We have offered the first studies into ensemble diversity for online learning and class imbalance learning, especially for multi-class imbalance. • Insight into diversity’s roles enables us to design better ensemble algorithms • We need more theoretical analysis of the algorithms.

From Evolutionary Computation to Ensemble Learning

From Evolutionary Computation to Ensemble Learning

Presentation Transcript

CS 776: Evolutionary Computation

Evolutionary Computation (EC)

Ensemble Learning 2, From Tree to Forest

Evolutionary Computation

Introduction to Evolutionary Computation

Evolutionary Computation

Evolutionary Computation

Evolutionary Computation

Interactive Evolutionary Computation

Evolutionary Computation

From the Origin of Species to Evolutionary Computation

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation

Learning Feature Mappings Using Evolutionary Computation

Introduction to Evolutionary Computation

Evolutionary Computation and beyond

Evolutionary Computation Introduction

Evolutionary Computation

Introduction to Evolutionary Computation