1 / 9

A Comparative Study of Kernel Methods for Classification Applications

A Comparative Study of Kernel Methods for Classification Applications. Yan Liu Sep 23, 2003. Introduction. Support Vector Machines Text classification Protein classification Various kernels Standard kernels Linear kernels, polynomial kernels, RBF kernels

ananda
Download Presentation

A Comparative Study of Kernel Methods for Classification Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparative Study of Kernel Methods for Classification Applications Yan Liu Sep 23, 2003

  2. Introduction • Support Vector Machines • Text classification • Protein classification • Various kernels • Standard kernels • Linear kernels, polynomial kernels, RBF kernels • Other application-oriented kernels • Fisher-kernels, String kernels and etc

  3. Problem Definition • There has been little study focusing on the behaviors of different kernels for: • Rare-class problem (unbalanced data) • Noisy data problem • Multi-label problem • These problems are common in the real applications: • Text classification • Protein Family classification

  4. Text Classification • Kernel selection • Linear kernels • String kernels • Problem Focus • Rare-class problem • Multi-class problem • Dataset • Reuters21578 dataset

  5. Protein Family Classification • Kernel selection • Linear kernels • String kernels • Fisher-kernels • Problem Focus • Rare-class problem • Noisy data problem • Dataset • GPCR classification dataset

  6. Methodology and Schedule • Propose conjectures on the possible behaviors according to analysis • Sep 12th ~ Sep 28th • Work on synthetic datasets to testify hypothesis • Sep 28th ~ Oct 20th • Map from synthetic data to real application data • Oct 20th ~ Sep 18th

  7. Mid-course Deliverables • Analysis of the dataset • Class distribution (rare-class and multi-class) • Noise level • Conjectures for possible behaviors • Results on synthetic datasets • Explanation and interesting observations from the results

  8. Multi-label Problem for Text Classification • Related work • Binary classification (one-vs-all) (by Yang; Joachims) • Mixture Model by EM (by McCallum) • Rank-based approach • Boosting (by Schapire & Singer) • Rank-based kernels (by Elsseeff & Weston)

  9. Multi-label Problem for Text Classification • Possible Solutions • Combine Mixture Model and Kernel-based approach using Fisher-kernels • Similar idea as using HMM and SVM together for protein classification

More Related