Classifying and clustering using Support Vector Machine

Classifying and clustering using Support Vector Machine 2nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work

Classifying (clustering) steps • Text mining – features extraction • Features selection • Classifying or Clustering • Testing results

Reuters Database Processing • 806791 total documents, 126 topics, 366 regions, 870 industry codes • Industry category selection – “system software” • 7083 documents • 4722 training samples • 2361 testing samples • 19038 attributes (features) • 68 classes (topics) • Binary classification • Topics “c152” (only 2096 from 7083)

Features extraction • Frequency vector • Terms frequency • Stopwords • Stemming • Threshold • Large frequency vector

Features selection • Information Gain • SVM features selection • Liniar kernel – weight vector

Support Vector Machine • Binary classification • Optimal hyperplane • Higher-dimensional feature space • Primal optimization problem • Dual optimization problem - Lagrange multipliers • Karush-Kuhn-Tucker conditions • Support Vectors • Kernel trick • Decision function

Optimal Hyperplane {x|‹w,x›+b=+1} {x|‹w,x›+b=-1} X1 yi=+1 X2 yi=-1 w margin g {x|‹w,x›+b=0}

Higher-dimensional feature space

Primal optimization problem Dual optimization problem Lagrange formulation • Maximize: • subject to:

SVM - caracteristics • Karush-Kuhn-Tucker (KKT) conditions • only the Lagrange multipliers that are non-zero at the saddle point • Support Vectors • the patterns xifor which • Kernel trick • Positively defined kernel • Decision function

Multi-class classification • Separate one class versus the rest

Clustering • Caracteristics • mapped data into a higher dimensional space • search for the minimal enclosing sphere • Primal optimisation problem • Dual optimisation problem • Karush Kuhn Tucker condition

SMO characteristics • Only two parameters are updated (minimal size of updates). • Benefit: • doesn’t need any extra matrix storage • doesn’t need to use numerical QP optimization step • needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up • Components: • Analytic method to solve the problem for two Lagrange multipliers • Heuristics for choosing the points

SMO - components • Analytic method • Heuristics for choosing the point • Choice of 1st point (x1/a1): • Find KKT violations • Choice of 2nd point (x2/a2): • update a1, a2 which cause a large change, which, in turn, result in a large increase of the dual objective • maximize quantity |E1-E2|

Probabilistic outputs

Features selection using SVM • Linear kernel • Primal optimisation form • Keeped only that value that have weight in learned w vector great ther a threshold

Kernels used • Polynomial kernel • Gaussian kernel

Data representation • Binary • using values ”0” and “1” • Nominal • Connell SMART

Binary classification - 63

Binary classification - 7999

Influence of vector size • Polynomial kernel

Influence of vector size • Gaussian kernel

IG versus SVM – 427 features • Polynomial kernel

IG versus SVM – 427 features • Gaussian kernel

LibSvm versus UseSvm - 2493 • Polynomial kernel

LibSvm versus UseSvm - 2493 • Gaussian kernel

Multiclass classification • Polynomial kernel - 2488 features

Multiclass classification • Gaussian kernel 2488 features

Clustering using SVM

Conclusions – best results • Polynomial kernel and nominal representation (degree 5 and 6 ) • Gaussian kernel and Connell Smart ( C=2.7) • Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) • # features between 6% (1309) and 10% (2488) • Multiclass follows the binary classification • Clustering has a smaller # of sv‘s • Clustering follows binary classification

Further work • Features extraction and selection • Association rules between words (Mutual Information) • Synonym and Polysemy problem • Better implementation of SVM with linear kernel • Using families of words (WordNet) • SVM with kernel degree greater then 1 • Classification and clustering • Using classification and clustering together

Classifying and clustering using Support Vector Machine

Classifying and clustering using Support Vector Machine

Presentation Transcript

Machine Learning Using Support Vector Machines

Classifying Lymphoma Dataset Using Multi-class Support Vector Machines

Support Vector Machine

Support vector machine

Support vector machine

Support vector machine

Support Vector Machine

Support Vector Machine

Using Support Vector Machine for Integrating Catalogs

Locally Constraint Support Vector Clustering

Support Vector Machine

Support Vector Machine

Support Vector Clustering Algorithm

Question Classification using Support Vector Machine

Support Vector Machine

Support Vector Clustering

Support Vector Machine

Support Vector Machine

Support Vector Machine

Support Vector Machine

Support Vector Machine

Classifying Lymphoma Dataset Using Multi-class Support Vector Machines