Libsvm-2.6 使用介绍

Libsvm-2.6使用介绍 quietsea@bbs.hit.edu.cn

Libsvm-2.6特点 • Support multi-class classification • Different SVM formulation • Cross-validation for model selection • Probability estimate • Weighted SVM for unbalanced data • Both C++ and Java sources • Version 2.8 released on April fool’s day,2005

Libsvm-2.6程序结构 • Kernel 类 • Solver类：Generalized SMO和SVMLight algorithm 解二次规划问题 • 采用one-against-one 解决多类分类

Format of training and testing data file • <label> <index1>:<value1> <index2>:<value2> ... +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 -1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 +1 1:0.166667 2:1 3:-1 4:-0.433962 5:-0.383562 6:-1 7:-1 -1 1:0.458333 2:1 3:1 4:-0.358491 5:-0.374429 6:-1 7:-1

Data scaling • Avoid attributes in greater numeric ranges dominate those in smaller number ranges. • Usually scale each attribute to [0,1] or[-1,+1]. • svmscale –l -1 –u 1 –s range train.1>train.1.scale • svmscale –r range test.1>test.1.scale

Svmtrain • One-class:Here a hyperplane is placed such that it separates the dataset from the origin with maximal margin. The regularization parameter nu(0,1), is a user defined parameter indicating the fraction of the data that should be accepted by the description. • nu-SVR: nu回归机。引入能够自动计算epsilon的参数nu。若记错误样本的个数为q ,则nu大于等于q/l,即nu是错误样本的个数所占总样本数的份额的上界；若记支持向量的个数为p,则nu小于等于p/l,即nu是支持向量的个数所占总样本数的份额的下界。首先选择参数nu和C,然后求解最优化问题。 • Shrinking：优化求解过程中是否采用shrinking. 边界支持向量BSVs（ai＝C的SV）在迭代过程中ai不会变化，如果找到这些点，并把它们固定为C，可以减少QP的规模。 • Probability estimate: 是否训练SVC和SVR获得概率输出 • -wi 不平衡样本的加权参数

Output of training C-SVM • optimization finished, #iter = 219 nu = 0.431030 :nu-SVM is a somewhat equivalent form of C-SVM where C is replaced by nu. obj = -100.877286:optimal objective value of the dual problme. rho = 0.424632 :bias term of the decision function.nSV = 132, nBSV = 107: number of the bounded support vectors Total nSV = 132

Model file • svm_type c_svc • kernel_type rbf • gamma 0.0769231 • nr_class 2:number of classes. For regression and one-class model, this number is 2. • total_sv 132 • rho 0.424632 • label 1 -1 • nr_sv 64 68: number of support vector for each class. • SV

Two tools for Model Selection • Easy.py: does everything automatically-from data scaling to parameter selection • Grid.py: uses grid search to find the best model parameters Grid.py的输出文件 • -out: 搜索过程。每个参数取值及此时精度 • -png: 搜索过程等高线图

Proposed procedure • Transform data to the format of Libsvm. • Conduct simple scaling on the data. • Consider the RBF kernel. • Using the cross-validate to find the best model parameters. • Using the best parameters to train the whole training set. • Test

Experiments • Original sets with default parameters Accuracy=9.7561% • Scaled sets with default parameters Accuracy=87.8049% • Scaled sets with parameter selection Accuracy=95.123% • Using an automatic script Accuracy=95.122%

Remark • Recommend Python 2.3 • Recommend Gnuplot version 3.7.3.Vesion 3.7.1 has a bug.

References • A practical guide to support vector machines classification • LIBSVM: a Library for Support Vector Machines • FAQ and Readme in Libsvm-2.6 • http://www.csie.ntu.edu.tw/~cjlin/

Libsvm-2.6 使用介绍

Libsvm-2.6 使用介绍

Presentation Transcript

Working Set Selection - a faster approach

MITM 613 Intelligent System

Top Data Mining Tools