Are we still talking about diversity in classifier ensembles?

Are we still talking about diversity in classifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK

Completely irrelevant to your Workshop... Are we still talking about diversity in classifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK

Let’s talk instead of: Multi-view and classifier ensembles

A classifier ensemble class label “combiner” classifier classifier classifier feature values (object description)

class label combiner ensemble? classifier classifier a neural network feature values (object description)

class label a fancy combiner ensemble? classifier classifier classifier classifier classifier classifier classifier feature values (object description)

a fancy feature extractor class label classifier? “combiner” classifier classifier classifier feature values (object description)

Why classifier ensembles then? a.because we like to complicate entities beyond necessity (anti-Occam’s razor) b. because we are lazy and stupid and can’t be bothered to design and train one single sophisticated classifier c.because democracy is so important to our society, it must be important to classification

combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] bagging, boosting, arcing, wagging [Sharkey99] modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. oldest oldest fanciest

combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98] classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96] mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91] committees of neural networks [Bishop95,Drucker94] consensus aggregation [Benediktsson92,Ng92,Benediktsson97] voting pool of classifiers [Battiti94] dynamic classifier selection [Woods97] composite classifier systems [Dasarathy78] classifier ensembles [Drucker94,Filippi94,Sharkey99] bagging, boosting, arcing, wagging [Sharkey99] modular systems [Sharkey99] collective recognition [Rastrigin81,Barabash83] stacked generalization [Wolpert92] divide-and-conquer classifiers [Chiang94] pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93] etc. Out of fashion Subsumed

Congratulations! The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum. We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love. class label classifier ensemble combiner classifier classifier classifier feature values (object description)

class label classifier ensemble combiner classifier classifier classifier cited 7194 times by 28 July 2013 (Google Scholar) feature values (object description)

Classifier combination? Hmmmm….. David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21(1), 1-14. SasoDzeroski We are kidding ourselves; there is no real progress in spite of ensemble methods. S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, 255-273. Chances are that the single best classifier will be better than the ensemble. David Hand

Quo Vadis? "combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning")

Gartner’s Hype Cycle: a typical evolution pattern of a new technology Where are we?...

(6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics JASA = Journal of the American Statistical Association IJCV = International Journal of Computer Vision JTB = Journal of Theoretical Biology (2) PPL = Protein and Peptide Letters JAE = Journal of Animal Ecology PR = Pattern Recognition (4) ML = Machine Learning NN = Neural Networks CC = Cerebral Cortex top cited paper is from… application paper

International Workshop on Multiple Classifier Systems 2000 – 2013 - continuing

Levels of questions • A Combination level • selection or fusion? • voting or another combination method? • trainable or non-trainable combiner? Combiner • B Classifier level • same or different classifiers? • decision trees, neural networks or other? • how many? Classifier 1 Classifier 2 … Classifier L • CFeature level • all features or subsets of features? • random or selected subsets? Features • DData level • independent/dependent bootstrap samples? • selected data sets? Data set

50 diverse linear classifiers 50 non-diverse linear classifiers

Strength of classifiers The perfect classifier ?  Large ensemble of nearly identical classifiers - REDUNDANCY • 3-8 classifiers • heterogeneous • trained combiner • (stacked generalisation) • 30-50 classifiers • same or different models? • trained or non-trained combiner? • selection or fusion? • IS IT WORTH IT? How about here? Number of classifiers L 1 Must engineer diversity…  Small ensembles of weak classifiers - INSUFFICIENCY ? • 100+ classifiers • same model • non-trained combiner • (bagging, boosting, etc.)

Strength of classifiers The perfect classifier  Large ensemble of nearly identical classifiers - REDUNDANCY • 3-8 classifiers • heterogeneous • trained combiner • (stacked generalisation) • 30-50 classifiers • same or different models? • trained or non-trained combiner? • selection or fusion? • IS IT WORTH IT? Number of classifiers L 1 Must engineer diversity…  Small ensembles of weak classifiers - INSUFFICIENCY • 100+ classifiers • same model • non-trained combiner • (bagging, boosting, etc.)

A classifier ensemble class label “combiner” classifier classifier classifier one view feature values (object description)

A classifier ensemble class label “combiner” classifier classifier classifier multiple views feature values (object description) feature values (object description) feature values (object description)

1998

“distinct” is what you call “late fusion” “shared” is what you call “early fusion”

EXPRESSION OF EMOTION - MODALITIES physiological behavioural facial expression central nervous system eye tracking interaction with the computer EEG gesture peripheral nervous system fMRI speech fNIRS posture pulse rate EMG pressure on mouse pulse variation respiration skin to drag-click speed Galvanic skin response blood pressure dialogue with tutor

Data Classification Strategies modality 1 (1) Concatenate the features from all modalities “early fusion” (2) Feature extraction and concatenation “mid-fusion” modality 2 (3) Straight ensemble classification “late fusion” ensemble modality 3 And many combinations thereof...

Data Classification Strategies We capture all dependencies but can’t handle the complexity modality 1 (1) Concatenate the features from all modalities “early fusion” (2) Feature extraction and concatenation “mid-fusion” modality 2 (3) Straight ensemble classification “late fusion” ensemble modality 3 We lose the dependencies but can handle the complexity

Ensemble Feature Selection By the ensemble (RANKERS) For the ensemble Decision tree ensembles Bootstrap ensembles of rankers Ensembles of different rankers Multiview late fusion Systematic approach Random approach Uniform (Random subspace) Incremental or iterative Non-uniform (GA) Feature selection Greedy Greedy Multiview early and mid-fusion

Uniform (Random subspace) Incremental or iterative Non-uniform (GA) Feature selection Greedy Greedy Multiview early and mid-fusion

This is what I think: Deciding which approach to take is rather art than science This choice is, crucially, CONTEX-SPECIFIC.

Where does diversity come to this? Hmm... Nowhere...

Are we still talking about diversity in classifier ensembles?