350 likes | 537 Views
Classifying and clustering using Support Vector Machine. 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU , MSc PhD Suppervisor: Lucian N. VIN ŢAN. Sibiu, 2005. Contents. Classification (clustering) steps Reuters Database processing
E N D
Classifying and clustering using Support Vector Machine 2nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005
Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work
Classifying (clustering) steps • Text mining – features extraction • Features selection • Classifying or Clustering • Testing results
Reuters Database Processing • 806791 total documents, 126 topics, 366 regions, 870 industry codes • Industry category selection – “system software” • 7083 documents • 4722 training samples • 2361 testing samples • 19038 attributes (features) • 68 classes (topics) • Binary classification • Topics “c152” (only 2096 from 7083)
Features extraction • Frequency vector • Terms frequency • Stopwords • Stemming • Threshold • Large frequency vector
Features selection • Information Gain • SVM features selection • Liniar kernel – weight vector
Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work
Support Vector Machine • Binary classification • Optimal hyperplane • Higher-dimensional feature space • Primal optimization problem • Dual optimization problem - Lagrange multipliers • Karush-Kuhn-Tucker conditions • Support Vectors • Kernel trick • Decision function
Optimal Hyperplane {x|‹w,x›+b=+1} {x|‹w,x›+b=-1} X1 yi=+1 X2 yi=-1 w margin g {x|‹w,x›+b=0}
Primal optimization problem Dual optimization problem Lagrange formulation • Maximize: • subject to:
SVM - caracteristics • Karush-Kuhn-Tucker (KKT) conditions • only the Lagrange multipliers that are non-zero at the saddle point • Support Vectors • the patterns xifor which • Kernel trick • Positively defined kernel • Decision function
Multi-class classification • Separate one class versus the rest
Clustering • Caracteristics • mapped data into a higher dimensional space • search for the minimal enclosing sphere • Primal optimisation problem • Dual optimisation problem • Karush Kuhn Tucker condition
Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work
SMO characteristics • Only two parameters are updated (minimal size of updates). • Benefit: • doesn’t need any extra matrix storage • doesn’t need to use numerical QP optimization step • needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up • Components: • Analytic method to solve the problem for two Lagrange multipliers • Heuristics for choosing the points
SMO - components • Analytic method • Heuristics for choosing the point • Choice of 1st point (x1/a1): • Find KKT violations • Choice of 2nd point (x2/a2): • update a1, a2 which cause a large change, which, in turn, result in a large increase of the dual objective • maximize quantity |E1-E2|
Features selection using SVM • Linear kernel • Primal optimisation form • Keeped only that value that have weight in learned w vector great ther a threshold
Contents • Classification (clustering) steps • Reuters Database processing • Feature extraction and selection • Information Gain • Support Vector Machine • Support Vector Machine • Binary classification • Multiclass classification • Clustering • Sequential Minimal Optimizations (SMO) • Probabilistic outputs • Experiments & results • Binary classification. Aspects and results. • Feature subset selection. A comparative approach. • Multiclass classification. Quantitative aspects. • Clustering. Quantitative aspects. • Conclusions and further work
Kernels used • Polynomial kernel • Gaussian kernel
Data representation • Binary • using values ”0” and “1” • Nominal • Connell SMART
Influence of vector size • Polynomial kernel
Influence of vector size • Gaussian kernel
IG versus SVM – 427 features • Polynomial kernel
IG versus SVM – 427 features • Gaussian kernel
LibSvm versus UseSvm - 2493 • Polynomial kernel
LibSvm versus UseSvm - 2493 • Gaussian kernel
Multiclass classification • Polynomial kernel - 2488 features
Multiclass classification • Gaussian kernel 2488 features
Conclusions – best results • Polynomial kernel and nominal representation (degree 5 and 6 ) • Gaussian kernel and Connell Smart ( C=2.7) • Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) • # features between 6% (1309) and 10% (2488) • Multiclass follows the binary classification • Clustering has a smaller # of sv‘s • Clustering follows binary classification
Further work • Features extraction and selection • Association rules between words (Mutual Information) • Synonym and Polysemy problem • Better implementation of SVM with linear kernel • Using families of words (WordNet) • SVM with kernel degree greater then 1 • Classification and clustering • Using classification and clustering together