870 likes | 974 Views
Explore the intersection of signal processing and machine learning, from computer vision to network security. Learn about multi-class classification methods and the design of hierarchical classifiers. Discover the power of Support Vector Machines.
When Signal Processing Meets Machine Learning Yu-Chiang Wang 王鈺強, PhD Candidate Electrical & Computer Engineering Carnegie Mellon University March 2009
Who Are They? • Signal Processing “Signal processing is the analysis, interpretation, and manipulation of signals…” - Wikipedia • Machine Learning / Pattern Recognition The computer automatically improves “TPE”… – a taskT – according to a performance metricP – through experienceE - Prof. Tom Mitchell @ CMU
Where They Meet? • IEEE Signal Processing Society: ICASSP, ICIP, ICME, Trans. IP/SP, etc. Computer Science Society: CVPR, ICCV, ICDM, Trans. PAMI, etc. Computer Intelligence Society: IJCNN, Trans. NNs, etc. • IAPR ICPR, ICIAP, ICDAR, ICB, MCS, etc. • ACM SIGIR, SIGGRAPH, SIGMM, etc. • Int’l NN, ML, etc. societies IJCNN, ICML, etc. • Journals on PR, PR Letters, etc.
Verification: Is This Mr. Einstein? No, He’s Not. Actually, my advisor looks more like Einstein…
Object Categorization Animals Arch Faces Ground etc.
Scene & Context Categorization Outdoor, Night, etc.
It’s not that complicated…is it?? • Down-sampled & grayscale 22 x 28 pixel image • It’s Bayesian in Machine Learning… Likelihood: P(image | face) or Posterior: P(face | image) • Without prior knowledge… Number of all possible 22 x 28 pixel images = 222 x 28 x 8 • Do not try this at home… 222 x 28 x 8 =1.2 x 1015 >>>> world population 6.6 x 109 Inspired by Prof. Tsuhan Chen @ Cornell
The Chemistry betweenSignal Processing & Machine Learning • Signal Processing Representation & Transforms Coding Transmission Compression Reconstruction • Machine Learning Feature extraction/selection Information retrieval (data mining) Sup. or unsup. learning/clustering Detection/estimation/identification Classification Bioinformatics Counter-Terrorism Language Processing Computer Vision Biometrics Product Inspection Marketing Analysis Internet Search Network Security
Pattern Recognition Applications Biometrics
Pattern Recognition Applications Automated Target Recognition
Pattern Recognition Applications Cancer Diagnosis
Pattern Recognition Applications Analysis of 3D protein structure
Pattern Recognition Applications Clustering / analysis of microarray data
Three Main Issuesby Prof. Fei-Fei Li @ Princeton • Representation - how to represent a pattern class or a dataset - feature extraction & selection • Learning - how to form a classification system (given training data) - classifier and its parameter selection • Decision - how to classify the given test data - verification/identification, no decision, rejection, etc. • I focus on the latter two…
It’s Much More Complicated than DEAL OR NO DEAL… Multi-class classification & Rejection
Multi-Class Classification • Need good accuracy + efficiency
Rejection • What to reject? - any unseen false classes - very difficult…why?
2 vs. 1 & 3 3 vs. 1 & 2 1 vs. 2 & 3 Standard Methods to Address Multi-class Classification Problems • One-vs-All Binary Classifiers - C binary classifiers - decided by winner-take-all • Any problems? - possible ambiguous results - similarity not used - unbalanced data learning Cl. 2 Cl. 1 Cl. 3 One Million Dollar Question by Prof. Thomas Huang @ UIUC
1 vs. 2 2 vs. 3 1 vs. 3 Standard Methods to Address Multi-class Classification Problems (cont’d) • One-vs-One Binary Classifiers - C2/2 binary classifiers - decided by majority vote • Any problems? - possible ambiguous results - need lots of classifiers - cannot do rejection Cl. 2 Cl. 1 Cl. 3
Binary Hierarchical Classifier • Remarks - Divide-and-conquer strategy C-class problem → C-1 binary sub-problems Only ~log2C classifiers required in testing - We use SVM-type classifiers at each node - How to design a hierarchy? any problems??
Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions
Support Vector Machine • Binary classification problem
Support Vector Machine (cont’d) • Binary classification problem max. margin
h Margin = Support Vector Machine (cont’d) • How to find this optimal hyperplane? min. s.t. , xi with nonzero αi: support vectors
h margin Support Vector Machine (cont’d) • If not separable, we can either - introduce slack variable ξand penalty term C min. s.t.
Support Vector Machine (cont’d) • If not separable, we can also - find a nonlinear solution- technically, it’s a linear solution in a higher-order space x2 x 0 x1
Support Vector Machine (cont’d) • What happens now? min. s.t. • Kernel trick, the secret behind the scene , explicit form of Φ(x) not needed e.g. Gaussian: exp(-||xi-xj||2/2σ2), polynomial: (xiTxj+1)d , etc. x2 x1
Extensions of SVMs • One-class SVM - aka SVDD (support vector domain description) - find the optimal hyperplane to include data from one class & thus reject any other classes (i.e. false classes) - modified optimization problem min. s.t. - can add/apply ξ , C, kernel functions - solution vector h (nonlinear solution) x2 x1
h hSVM • An Example of One-class SVM w/ Gaussian Kernel - projected data on a unit sphere in the transformed space since - h in one-class SVM can be considered as the best single representative of the class of interest (class 1 in fig) - we use this h later in our new hierarchical design method class 1 class 2
Extensions of SVMs (cont’d) • SVRDM (Representation & Discrimination) - classification (SVM) + rejection (one-class SVM) & Gaussian kernel - need 2 solution vectors h1 & h2 - we use SVRDMs at each node in the hierarchy -1 < p < 1 SVM (p = -1) SVM p = 0.6 ≠ SVM
Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions
Design Methods for Binary Hierarchical Classifiers • Binary Hierarchical Classifier Design Class 2 Class 3 Class 1 Class 5 Class 4 ?
N classes N classes N/2 classes macro-class A N/2 classes macro-class B Prior Binary Hierarchical Design Methods - 1 • Exhaustive search? - Need to search all possible macro-class pairs - You cannot beat this method, BUT… 2N-1 possible pairs at each node! 2N-1 possible choices!!
Class 2 Class 3 Class 1 Class 5 Class 4 Prior Binary Hierarchical Design Methods - 2 • K-means clustering on class means (review) iteratively min. ; μ: class mean, m: cluster mean - determines 2 macro-classes (k = 2) in the original data space μ2 μ3 μ1 μ5 μ4 K-means clustering (k = 2) cluster 1 cluster 2
h: solution in higher-order space Φ(m): cluster mean in same space Weighted Support Vector K-means Clustering - 1 • k-means clustering on solution vectors h from one-class SVMs - recall that, h is the best representative of each class iteratively min. - determines 2 macro-classes (k = 2) in higher-order space h2 Class 2 h3 Class 3 Class 1 h1 h5 Class 5 h4 Class 4 K-means clustering (k = 2) cluster 1 cluster 2
Weighted Support Vector K-means Clustering - 2 • Remarks - solution vector h is a weighted sum of support vectors of each class - can be easily calculated by kernel trick, as SVMs e. g. - select macro-classes in higher-order space, as SVMs - automated design (not an exhaustive search & no valid set needed) - visualize distances in higher-order space
Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions
Problems in Binary Hierarchical Classifiers • Major Concerns - if misclassifications or misses occur at some internal nodes, we cannot recover from them if hard decisions used - a soft-decision hierarchical classifier is needed! C classes misclassification miss macro-class macro-class macro-cl macro-cl class ω
Soft-Decision Hierarchical Classifier - 1 • Idea: use of probabilities C classes two-class classifiers P1A P1B macro-class macro-class P2A P2B macro-cl macro-cl macro-cl macro-cl P3A P3B class ω P(ω|x) = P1B x P2A x P3B
Soft-Decision Hierarchical Classifier - 2 • How to convert SVM classifier outputs to probabilities? - Use sigmoid mapping function* to map output t to probability P - Estimate parameters a & b by ML estimate input x SVM output t = hTΦ(x)+b P(y = 1|t) P(y = -1|t) = 1 - P(y = 1|t) *J. C. Platt, “Probabilities for SVMs,” in Adv. in Large Margin Classifiers, MIT Press, 1999