420 likes | 638 Views
Data Dependence in Combining Classifiers. Mohamed Kamel PAMI Lab University of Waterloo. Outline. Introduction Data Dependence Implicit Dependence Explicit Dependence Feature Based Architecture Training Algorithm Results Conclusions. Introduction. Pattern Recognition Systems
E N D
Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo
Outline • Introduction • Data Dependence • Implicit Dependence • Explicit Dependence • Feature Based Architecture • Training Algorithm • Results • Conclusions
Introduction • Pattern Recognition Systems • Best possible classification rates. • Increase efficiency and accuracy. • Multiple Classifier Systems • Empirical Observation • Problem decomposed naturally from using various sensors • Avoid making commitments to arbitrary initial conditions or parameters “Patterns mis-classified by different classifiers are not necessarily the same”[Kittler et. al., 98] Introduction Data Dependence in Combining Classifiers
Categorization of MCS • Architecture • Input/Output Mapping • Representation • Specialized classifiers Introduction Data Dependence in Combining Classifiers
Input 1 Classifier 1 FUSION Output Input 2 Classifier 2 Input N Classifier N Output Input 1 Classifier 1 Classifier N Classifier 2 Input 2 Input N Categorization of MCS (cntd…)Architecture • Parallel [Dasarathy, 94] • Serial [Dasarathy, 94] Introduction Data Dependence in Combining Classifiers
Categorization of MCS (cntd…)Input/Output Mapping • Linear Mapping • Sum Rule • Weighted Average[Hashem 97] • Non-linear Mapping • Maximum • Majority • Hierarchal Mixture of Experts[Jordon and Jacobs 94] • Stacked Generalization[Wolpert 92] Introduction Data Dependence in Combining Classifiers
Categorization of MCS (cntd…)Representation • Similar representations • Classifiers need to be different • Different representation • Use of different sensors • Different features extracted from the same data set Introduction Data Dependence in Combining Classifiers
Categorization of MCS (cntd…)Specialized Classifiers • Specialized classifiers • Encourage specialization in areas of the feature space • All classifiers must contribute to achieve a final decision • Hierarchal Mixture of Experts[Jordon and Jacobs 94] • Co-operative Modular Neural Networks[Auda and Kamel 98] • Ensemble of classifiers • Set of redundant classifiers Introduction Data Dependence in Combining Classifiers
Categorization of MCS (cntd…) • Data Dependence • Classifiers inherently dependent on the data. • Describe how the final aggregation uses the information present in the input pattern. • Describe the relationship between the final output Q(x) and the pattern under classification x Introduction Data Dependence in Combining Classifiers
Data Dependence • Data Independent • Implicitly Dependent • Explicitly Dependent Data Dependence Data Dependence in Combining Classifiers
Data Independence • Solely rely on output of classifiers to determine final classification output. • Q(x) is the final class assigned for pattern x • Cjis a vector composed of the output of the various classifiers in the ensemble {c1j,c2j,...,cNj} for a given class yj • cij is the confidence classifier i has in pattern x belonging to class yj • Mapping Fjcan be linear or non-linear Data Dependence Data Dependence in Combining Classifiers
Data Independence (cntd…)Example • Average Vote • Aggregation result only relies on the output confidences of the classifiers • The operator Fjis the summation operation • Result skewed if individual confidences contain bias • Aggregation has no means of correcting this bias Data Dependence Data Dependence in Combining Classifiers
Data Independence (cntd…) • Simple voting techniques are data independent • Average • Maximum • Majority Susceptible to incorrect estimates of the confidence Data Dependence Data Dependence in Combining Classifiers
Implicit Data Dependence • Train the combiner on global performance of the data • W(C(x)) is the weighting matrix composed of elements wij • wij is the weight assigned to class j in classifier i Implicit Data Dependence in Combining Classifiers
Implicit Data Dependence(cntd…)Example • Weighted Average • Based on the error correlation matrix the individual weights are assigned as • The weights are dependent on the behavior of the classifiers amongst themselves • Weights can be represented as the function W(Cj(x)) Implicit Data Dependence in Combining Classifiers
Implicit Data Dependence(cntd…)Example • Weighted Average • Mapping is the summation operator • Hence Weighted average fits in the representation Implicit Data Dependence in Combining Classifiers
Implicit Data Dependence(cntd…) • Implicitly data dependent approaches include • Weighted average [Hashem 97] • Fuzzy Measures [Gader 96] • Belief theory[Xu and Krzyzak, 92] • Behavior Knowledge Space (BKS) [Huang, 95] • Decision Templates [Kuncheva 01] • Modular approaches[Auda and Kamel, 98] • Stacked Generalization[Wolpert 92] • Boosting [Schapire, 90] Lacks consideration for local superiority of classifiers Implicit Data Dependence in Combining Classifiers
Explicit Data Dependence • Classifier selection or combining performed based on the sub-space which the input pattern belongs to. • Final classification is dependent on the pattern being classified. Explicit Data Dependence in Combining Classifiers
Explicit Data Dependence (cntd…)Example • Dynamic Classifier Selection (DCS) • Estimation of the accuracy of each classifier in local regions of the feature space • Estimate determined by observing the input pattern • Once superiority of classifier is identified, it’s output is used as the final decision • i.e. Binary weights are assigned based on the local superiority of the classifiers. • Since weights are dependent on the input feature space they can be represented as W(x) • DCS could therefore be considered explicitly data dependent with the mapping Fj being the maximum operator Explicit Data Dependence in Combining Classifiers
Explicit Data Dependence (cntd…) • Explicitly Data Dependent approach include • Dynamic Classifier Selection (DCS) • DCS With local Accuracy (DCS_LA) [Woods et. al.,97] • DCS based on Multiple Classifier Behavior (DCS_MCB) [Giancinto and Roli, 01] • Hierarchal Mixture of Experts[Jordon and Jacobs 94] • Feature-based approach [Wanas et. al., 99] • Weights demonstrate dependence on the input pattern. Intuitively will perform better than other methods Explicit Data Dependence in Combining Classifiers
Feature Based Architectures • Methodology to incorporate multiple classifiers in a dynamically adapting system • Aggregation adapts to the behavior of the ensemble • Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input • A trained aggregation learns to combine the different decisions Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…)Architecture I Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…) • Classifiers • Each individual classifier, Ci, produces some output representing its interpretation of the input x • Utilizing sub-optimal classifiers. • The collection of classifier outputs for class yj is represented as Cj(x) • Detector • Detector Dl is a classifier that uses input features to extract useful information for aggregation • Doesn’t aim to solve the classification problem. • Detector output dlg(x) is a probablilty that the input pattern x is categorized to group g. • The output of all the detectors is represented by D(X) Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…) • Aggregation • Fusion layer for all the classifiers • Trained to adapt to the behavior of the various modules • Explicit data dependent Weights dependent on the input pattern being classified Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…)Architecture II Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…) • Classifiers • Each individual classifier, Ci, produces some output representing its interpretation of the input x • Utilizing sub-optimal classifiers. • The collection of classifier outputs for class yj is represented as Cj(x) • Detector • Appends input to output of classifier ensemble. • Produces a weighting factor, wij ,for each class in a classifier output. • The dependence of the weights on both the classifier output and the input pattern is represented by W(x,Cj (x)) Feature Based Data Dependence in Combining Classifiers
Feature Based Architectures(cntd…) • Aggregation • Fusion layer for all the classifiers • Trained to adapt to the behavior of the various modules • Combines implicit and explicit data dependence Weights dependent on the input pattern and the performance of the classifiers. Feature Based Data Dependence in Combining Classifiers
Results Five one-hidden layer BP classifiers • Training used partially disjoint data sets • No optimization is performed for the trained networks • The parameters of all the networks are maintained for all the classifiers that are trained • Three data sets • 20 Class Gaussian • Satimages • Clouds data Results Data Dependence in Combining Classifiers
Results (cntd…) Results Data Dependence in Combining Classifiers
Training • Training each component independently • Optimize individual components, may not lead to overall improvement • Collinearity, high correlation between classifiers • Components, under-trained or over-trained Training Data Dependence in Combining Classifiers
Training (cntd…) • Adaptive training • Selective: Reducing correlation between components • Focused: Re-training focuses on misclassified patterns. • Efficient: Determined the duration of training Training Data Dependence in Combining Classifiers
Adaptive Training: Main loop • Increase diversity among ensemble • Incremental learning • Evaluation of training to determine the re-training set Training Data Dependence in Combining Classifiers
Adaptive Training: Training • Save classifier if it performs well on the evaluation set • Determine when to terminate training for each module Training Data Dependence in Combining Classifiers
Adaptive Training: Evaluation • Train aggregation modules • Evaluate training sets for each classifier • Compose new training data Training Data Dependence in Combining Classifiers
Adaptive Training: Data Selection • New training data are composed by concatenating • Errori: Misclassified entries of training data for classifier i. • Correcti: Random choice of R*(P*δ_i) correctly classified entries of the training data for classifier i. Training Data Dependence in Combining Classifiers
Results Five one-hidden layer BP classifiers • Training used partially disjoint data sets • No optimization is performed for the trained networks • The parameters of all the networks are maintained for all the classifiers that are trained • Three data sets • 20 Class Gaussian • Satimages • Clouds data Results Data Dependence in Combining Classifiers
Results (cntd…) Results Data Dependence in Combining Classifiers
Conclusions • Categorization of various combining approaches based on data dependence • Independent : vulnerable to incorrect confidence estimates • implicitly dependent: doesn’t take into account local superiority of classifiers • Explicitly dependent: Literature focuses on selection not combining Conclusions Data Dependence in Combining Classifiers
Conclusions (cntd…) • Feature-based approach • Combines implicit and explicit data dependence • Uses an Evolving training algorithm to enhance diversity amongst classifiers • Reduces harmful correlation • Determines duration of training • Improved classification accuracy Conclusions Data Dependence in Combining Classifiers
References [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans. PAMI, 20:3, 226-239, 1998. [Dasarthy, 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, 1994. [Hashem, 1997] S. Hashem, “Algorthims for Optimal Linear Combination of Neural Networks” Int. Conf. on Neural Networks, Vol 1, 242-247, 1997. [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM Algorithm”, Neural Computing, 181-214, 1994. [Wolpert, 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, 241-259, 1992 [Auda and Kamel, 98] G. Auda and M. Kamel, “Modular Neural Network Classifiers: A Comparative Study”, J. Int. Rob. Sys., Vol. 21, 117–129, 1998. [Gader et. al., 96] P. Gader, M. Mohamed, and J. Keller, “Fusion of Handwritten Word Classifiers”, Patt. Reco. Let.,17(6), 577–584, 1996. [Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their Applications to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), 418-435, 1992 [Kuncheva et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, “Decsion Templates for Multiple Classifier Fusion: An Experimental Comparison”, Patt. Reco., vol. 34, 299–314, 2001. [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, “The Combination of Multiple Classifiers by a Neural Network Approach”, J. Patt. Reco. and Art. Int., Vol. 9, 579–597, 1995. [Schapire, 90] R. Schapire, “The Strength of Weak Learnability”, Mach. Lear., Vol. 5, 197–227,1990. [Giancinto and Roli, 01] G. Giancinto and F. Roli, “Dynamic Classifier Selection based on Multiple Classifier Behavior”, Patt. Reco., Vol. 34, 1879-1881, 2001. [Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation in Modular Neural Network Classifiers”, Patt. Reco. Lett., 20(11-13), 1353-1359, 1999. Data Dependence in Combining Classifiers