1 / 35

Classifying and clustering using Support Vector Machine

Classifying and clustering using Support Vector Machine. 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU , MSc PhD Suppervisor: Lucian N. VIN ŢAN. Sibiu, 2005. Contents. Classification (clustering) steps Reuters Database processing

tauret
Download Presentation

Classifying and clustering using Support Vector Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying and clustering using Support Vector Machine 2nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

  2. Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work

  3. Classifying (clustering) steps • Text mining – features extraction • Features selection • Classifying or Clustering • Testing results

  4. Reuters Database Processing • 806791 total documents, 126 topics, 366 regions, 870 industry codes • Industry category selection – “system software” • 7083 documents • 4722 training samples • 2361 testing samples • 19038 attributes (features) • 68 classes (topics) • Binary classification • Topics “c152” (only 2096 from 7083)

  5. Features extraction • Frequency vector • Terms frequency • Stopwords • Stemming • Threshold • Large frequency vector

  6. Features selection • Information Gain • SVM features selection • Liniar kernel – weight vector

  7. Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work

  8. Support Vector Machine • Binary classification • Optimal hyperplane • Higher-dimensional feature space • Primal optimization problem • Dual optimization problem - Lagrange multipliers • Karush-Kuhn-Tucker conditions • Support Vectors • Kernel trick • Decision function

  9. Optimal Hyperplane {x|‹w,x›+b=+1} {x|‹w,x›+b=-1} X1 yi=+1 X2 yi=-1 w margin g {x|‹w,x›+b=0}

  10. Higher-dimensional feature space

  11. Primal optimization problem Dual optimization problem Lagrange formulation • Maximize: • subject to:

  12. SVM - caracteristics • Karush-Kuhn-Tucker (KKT) conditions • only the Lagrange multipliers that are non-zero at the saddle point • Support Vectors • the patterns xifor which • Kernel trick • Positively defined kernel • Decision function

  13. Multi-class classification • Separate one class versus the rest

  14. Clustering • Caracteristics • mapped data into a higher dimensional space • search for the minimal enclosing sphere • Primal optimisation problem • Dual optimisation problem • Karush Kuhn Tucker condition

  15. Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work

  16. SMO characteristics • Only two parameters are updated (minimal size of updates). • Benefit: • doesn’t need any extra matrix storage • doesn’t need to use numerical QP optimization step • needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up • Components: • Analytic method to solve the problem for two Lagrange multipliers • Heuristics for choosing the points

  17. SMO - components • Analytic method • Heuristics for choosing the point • Choice of 1st point (x1/a1): • Find KKT violations • Choice of 2nd point (x2/a2): • update a1, a2 which cause a large change, which, in turn, result in a large increase of the dual objective • maximize quantity |E1-E2|

  18. Probabilistic outputs

  19. Features selection using SVM • Linear kernel • Primal optimisation form • Keeped only that value that have weight in learned w vector great ther a threshold

  20. Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work

  21. Kernels used • Polynomial kernel • Gaussian kernel

  22. Data representation • Binary • using values ”0” and “1” • Nominal • Connell SMART

  23. Binary classification - 63

  24. Binary classification - 7999

  25. Influence of vector size • Polynomial kernel

  26. Influence of vector size • Gaussian kernel

  27. IG versus SVM – 427 features • Polynomial kernel

  28. IG versus SVM – 427 features • Gaussian kernel

  29. LibSvm versus UseSvm - 2493 • Polynomial kernel

  30. LibSvm versus UseSvm - 2493 • Gaussian kernel

  31. Multiclass classification • Polynomial kernel - 2488 features

  32. Multiclass classification • Gaussian kernel 2488 features

  33. Clustering using SVM

  34. Conclusions – best results • Polynomial kernel and nominal representation (degree 5 and 6 ) • Gaussian kernel and Connell Smart ( C=2.7) • Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) • # features between 6% (1309) and 10% (2488) • Multiclass follows the binary classification • Clustering has a smaller # of sv‘s • Clustering follows binary classification

  35. Further work • Features extraction and selection • Association rules between words (Mutual Information) • Synonym and Polysemy problem • Better implementation of SVM with linear kernel • Using families of words (WordNet) • SVM with kernel degree greater then 1 • Classification and clustering • Using classification and clustering together

More Related