1 / 46

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification. M N S S K Pavan Kumar. Advisor : Dr. C. V. Jawahar. Pattern Classification. Given a sample x Find the label corresponding to it

spike
Download Presentation

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar

  2. Pattern Classification • Given a sample x • Find the label corresponding to it • A classifier is an algorithm, which takes x and returns the label between 1 to N • Binary Classification -- N = 2 • Multiclass classification -- N > 2 • Evaluation is usually done as probability of correct classification

  3. Multiclass Classification • Many standard approaches • Neural Networks, Decision Trees • Direct extensions • Combinations of component classifiers

  4. x 1,5 Sample x from class 3 2,5 1,4 3,5 2,4 1,3 4,5 2,3 1,2 3,4 4 3 2 1 5 Decision Directed Acyclic Graph

  5. x 1,5 Sample x from class 5 2,5 1,4 3,5 2,4 1,3 4,5 2,3 1,2 3,4 5 4 3 2 1 Decision Directed Acyclic Graph

  6. Decision Directed Acyclic Graph x 1,5 Sample x from class 4 2,5 1,4 3,5 2,4 1,3 4,5 2,3 1,2 3,4 4 3 2 1 5

  7. x 1,5 There are multiple paths 2,5 1,4 3,5 2,4 1,3 4,5 2,3 1,2 3,4 4 3 2 1 5 Decision Directed Acyclic Graph

  8. Decision Directed Acyclic Graph x 1,5 A DDAG can be improved by improving individual nodes 2,5 1,4 3,5 2,4 1,3 4,5 2,3 1,2 3,4 5 4 3 2 1

  9. Decision Directed Acyclic Graph x A DDAG can be improved by improving individual nodes 1,5 2,5 1,4 Architecture is fixed for a given sequence of classes 3,5 2,4 1,3 4,5 2,3 1,2 3,4 5 4 3 2 1

  10. Decision Directed Acyclic Graph x A DDAG can be improved by improving individual nodes 3,5 2,5 3,4 A DDAG can be improved by changing class order 1,5 2,4 3,1 4,5 2,1 3,2 1,4 5 4 1 2 3 Class Order Changed

  11. Features at Each Node • Image as Features • Large number of features in Computer vision problems • Principal Component Analysis (PCA) • Project the data onto an axis which preserves maximum variance • PCA is good for representation but not for discrimination

  12. Features at Each Node • Pairwise Linear Discriminant Analysis (LDA) is more effective • Fischer Linear Discriminant, Optimal Discriminant Vectors • Large number of feature extractions • Large number of matrices to be stored LDA performs better, but is computationally expensive

  13. Solution 1,4 4 3 1 2,4 1,3 2 3,4 2,3 1,2 3 2 1 4

  14. Solution M14 1,4 1 4 2,4 1,3 3 2 3,4 2,3 1,2 3 2 1 4

  15. Solution M14 1,4 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M23 3 2 1 4

  16. Solution M14 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 3 2 1 4 M23 M12

  17. Solution M14 M24 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M13 3 2 1 4 M23 M12 4 Classes 6 classifiers 6 Dimensionality Reductions Total number of features extracted : (N-1) * reduced_dimension

  18. Solution M14 M24 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M13 3 2 1 4 M23 M12 Example : 400 classes and 400 features reduced to 50 Results in 399000 Projections overall, and 19950 for a single evaluation DDAG

  19. M14 Solution M13 1,4 M34 1 4 2,4 1,3 3 2 3,4 2,3 1,2 M24 3 2 1 4 M23 M12 LDA is effective, but highly complex in space and time

  20. Solution M14 M13 M34 1 4 3 2 M24 M23 M12

  21. Solution M14 M13 M12 M34 M23 1 4 3 M34 M = 2 M13 M24 M14 M23 M24 M12 Stack all the transformations

  22. Solution M14 M12 M13 M23 M34 1 M34 4 M = 3 M13 2 M14 M24 M24 M23 This matrix is Rank Deficient M12

  23. M14 Solution M24 M12 M34 M23 1 4 3 M34 M = 2 M13 M13 M14 M23 M24 M12 This matrix is Rank Deficient Use a reduced representation

  24. M14 Solution M24 M12 M34 M23 1 4 3 M34 M = 2 M13 M13 M14 M23 M24 M12 This matrix is Rank Deficient Has many similar rows Clustering, SVD etc., may be used

  25. Remarks • Only one time feature extraction • Results in a reduced LDA matrix, retaining the discriminant capacity

  26. Motivating Example 1,4 Priors : {0.3, 0.1, 0.2, 0.4} All Classifiers are 90% Correct 2,4 1,3 1,4 0.3*(0.9)3 + 0.1*(0.5)*(0.9) 2 +0.2*(0.5)*(0.9) 2 + 0.4*(0.9)3 Reordering 3,4 2,3 1,2 2,4 1,3 2 1 4 3 3,4 2,3 1,2 Accuracy : 80.28 % Accuracy : 88.92 % 1 4 3 2 43.8% reduction in error !!

  27. Formulation 1,4 Number of classes = N • Prefer central positions in the list for high prior classes • Optimal Priors = Pi 2,4 1,3 Errors = q (at each nodes) Relevant Path length = max (N – i, i – 1) 3,4 2,3 1,2 Number of relevant paths of length l to node r = Nrl 2 1 4 3 Maximize

  28. Disadvantage of a DDAG • DDAG can provide only a class label • New DDAG classification protocol proposed • Previous formulation is insufficient

  29. Maximizing DDAG Accuracy 1,4 2,4 1,3 3,4 2,3 1,2 j i ……..

  30. DDAG design is NP-Hard • Optimal Decision Tree is NP-Hard • DAG Design is reducible to Optimal Decision Tree • Approximate algorithms are the only resort

  31. Proposed Algorithms • Three greedy algorithms • Prefer high prior classifiers to be at center of the DDAG • Prefer high performance classifiers to be the root nodes of the DDAG • Prefer high error classes to be at the center of the DDAG • Empirical results show that approximation error is close to half that of optimal graph

  32. Complexities of Classification

  33. Binary Hierarchical Classifiers 1,4,5 vs 2,3 3 5 2 4 vs 1,5 2 vs 3 1 4 4 1vs5 3 2 1 5

  34. Graph Partitioning 3 3 5 2 5 2 1 Root Node 1 4 4 1,2,4,5 vs 3 1,4 vs 2,3,5 Data Similarity Graph None of the partitioning schemes are universally good for all problems (No Free Lunch Theorem) We prefer Linear Cuts We prefer Linear Cuts with large Margin Objective : Maximize the cut Objective : Compact Clusters

  35. Graph Partitioning 3 3 5 2 5 2 1 1 4 4 Graph Data Simple Workaround : Use locally best partitions

  36. Margin Improvement 3 3 5 Remove class 2 5 2 1 1 4 4 Improved Margin Margin Don’t insist on mutually exclusive partitions Let some classes be there on both sides

  37. Trees with Overlapping Partitions 1,2 – 3– 4,5,6 1,2 – 3 3,4 – 5 – 6 3 1,2 3,4 - 5 5,6 5 3,4 2 1 5 6 3 4

  38. Comments • The complexity remains O(log(N)) • Different criterion for removing bad classes

  39. Configurable Hybrid Classifiers • DDAG : High Accuracy, Large Size • BHC : Moderate Accuracy, Small Size Take advantages of both If “classification” is easy, use BHC, otherwise use a DDAG

  40. Results on OCR datasets

  41. Classifiability • Use expected error to select appropriate classifiers • How easy or difficult is it to classify a set of classes • Computable from cooccurence matrices • We proposed a pair wise classifiability measure • Lpairwise =2/N(N-1)∑ Lij

  42. Generalization Capacity of Proposed Algorithms • The probability of error that a classifier makes on unseen samples is called generalization • Large Margin • Better features in a DDAG • Better partitions in a BHC • Use classifier of required complexity at each step (Occam’s Razor) • Efficient feature representations require less complex classifiers • Simpler partitions in BHC require less complex classifiers • Architecture level generalization • Hybrid classifiers use architectures of required complexity at each node, thereby improving the generalization • Empirically we have demonstrated the generalization of algorithms

  43. Conclusions • Formulation, Analysis and Algorithms are presented • to design DDAGs using robust feature representations • to design DDAGs using node-reordering • to design Hierarchical classifiers with better generalization • to design Hybrid hierarchical classifiers

  44. Future Work • Design based on simple algorithms may improve the current “high-performance” classifiers • Promising directions • Feature based partitioning vs Class based partitioning • Trees with overlapping partitions • Efficient DDAG design algorithms • Configurability in classifier design

  45. Thank You

More Related