1 / 22

Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data

Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data. Outline. Introduction SMO-SVM Parallel Muiticategory SVM Parallel Implementation and Environment Parallel Evaluation and Analysis Classifying Microarray Data Conclusions. Introduction.

Download Presentation

Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data

  2. Outline • Introduction • SMO-SVM • Parallel Muiticategory SVM • Parallel Implementation and Environment • Parallel Evaluation and Analysis • Classifying Microarray Data • Conclusions

  3. Introduction • Biologists want to separate the data into multiple categories using a reliable cancer diagnostic model. • Based on a comprehensive evaluation of several muiticategory classification methods, it is found that support vector machines (SVM) are the most effective classifiers for performing accurate cancer diagnosis form gene expression. • In the paper, we developed new parallel muiticategory support vector machines (PMC-SVM) based on the sequential minimum optimization-type decomposition methods for support vector machines (SMO-SVM) of LibSVM term that needs less memory.

  4. SMO-SVM The basic idea behind SVM is to separate two point classes of a training set, (1) by using a decision function optimization by solving a convex quadratic programming optimization problem of the form Subject to

  5. SMO-SVM where and is a constant. is a vector of all ones. is the symmetric positive semidefinite matrix. entries are defined as (3) where denotes a kernel function, such as polynomial kernel or Gaussian kernel.

  6. SMO-SVM • The subset, denoted as B, is called working set. • If B is restricted to have only two elements, this special type of decomposition method is the Sequential Minimal Optimization (SMO).

  7. There are four steps to implement SMO: Step1: Find as the initial feasible solution. Set Step2: If Is a stationary point of (2), stop. Otherwise, find a two-element working set Define , and and as subvector of corresponding to and ,respectively.

  8. If Step3: Solve the following sub-problem with the variable : (4) subject to else solve (5) subject to constraints of (4) Step4: Set to be the optimal solution of (4) and and go to step 2. . Set

  9. Parallel Muiticategory SVM(PMC-SVM) • In muiticategory classification of support vector machines, the algorithm will generate sub models for categories. • Generating models is the most time consuming task in this algorithm so it is desirable to distribute all the sub models onto multiple processors and each processor perform a subtask to improve the performance.

  10. Example: We have 4 processors and k=16, that means we have to generate k(k-1)/2 models, which are total 120 models. where is the total number of the processors and the number of categories.

  11. Parallel Implementation and Environment • One is the sharedmemory SGI Origin 2800 Supercomputers(sweetgum) equipped with 128 CPUs, 64 gigabytes of memory, and 1.6 Terabytes of fiberchannel disk. • The other is a distributed memory Linux cluster (mimosa) with 192 nodes.

  12. Parallel Evaluation and Analysis • PMC-SVM is tested on both sweetgum and mimosa platforms using the above two datasets. Dataset 1: Letter_scale classes: 26 trainig size: 16,000 features: 16 Dataset 2: Mnist_scale classes: 10 training size: 21,000 features: 780

  13. Figure 2. The speedup of PMC-SVM on sweetgum with Dataset 1 (Letter_scale ) Figure 3. The speedup of PMC-SVM on mimosa with Datasets 1 (Leetter_scale)

  14. Figure 4. The speedup of PMC-SVM on swetgum with Datasets 2 (Mnist_problem) Figure 4. The speedup of PMC-SVM on mimosa with Datasets 2 (Mnist_problem)

  15. Classifying Microarray Data In the work, two microarray datasets were to demonstrate the performance of PMC-SVM, as listed below: Dataset 3: 14_Tumors(40Mb) Human tumor types: 14 normal tissue types: 12 Dataset 4: 11_Tumors(18Mb) Human tumor types: 11

  16. Table 6: Performance on sweetgum (Dataset 3) Table 7: Performance on sweetgum (Dataset 4)

  17. Conclusions • PMC-SVM has been developed for classifying large datasets based on SMO-type decomposition method. • The experimental results show that the high performance computing techniques and parallel implementation can achieve a significant speedup.

  18. Thanks for your attendance!

More Related