90 likes | 261 Views
A Comparative Study of Kernel Methods for Classification Applications. Yan Liu Sep 23, 2003. Introduction. Support Vector Machines Text classification Protein classification Various kernels Standard kernels Linear kernels, polynomial kernels, RBF kernels
E N D
A Comparative Study of Kernel Methods for Classification Applications Yan Liu Sep 23, 2003
Introduction • Support Vector Machines • Text classification • Protein classification • Various kernels • Standard kernels • Linear kernels, polynomial kernels, RBF kernels • Other application-oriented kernels • Fisher-kernels, String kernels and etc
Problem Definition • There has been little study focusing on the behaviors of different kernels for: • Rare-class problem (unbalanced data) • Noisy data problem • Multi-label problem • These problems are common in the real applications: • Text classification • Protein Family classification
Text Classification • Kernel selection • Linear kernels • String kernels • Problem Focus • Rare-class problem • Multi-class problem • Dataset • Reuters21578 dataset
Protein Family Classification • Kernel selection • Linear kernels • String kernels • Fisher-kernels • Problem Focus • Rare-class problem • Noisy data problem • Dataset • GPCR classification dataset
Methodology and Schedule • Propose conjectures on the possible behaviors according to analysis • Sep 12th ~ Sep 28th • Work on synthetic datasets to testify hypothesis • Sep 28th ~ Oct 20th • Map from synthetic data to real application data • Oct 20th ~ Sep 18th
Mid-course Deliverables • Analysis of the dataset • Class distribution (rare-class and multi-class) • Noise level • Conjectures for possible behaviors • Results on synthetic datasets • Explanation and interesting observations from the results
Multi-label Problem for Text Classification • Related work • Binary classification (one-vs-all) (by Yang; Joachims) • Mixture Model by EM (by McCallum) • Rank-based approach • Boosting (by Schapire & Singer) • Rank-based kernels (by Elsseeff & Weston)
Multi-label Problem for Text Classification • Possible Solutions • Combine Mixture Model and Kernel-based approach using Fisher-kernels • Similar idea as using HMM and SVM together for protein classification