260 likes | 393 Views
Towards Scalable Support Vector Machines Using Squashing. Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen. Outline. 1. Motivation 2. Objective
E N D
Towards Scalable Support Vector Machines Using Squashing • Author:Dmitry Pavlov, Darya Chudova, • Padhraic Smyth • Info. And Comp. Science • University of California • Advisor:Dr. Hsu. • Reporter:Hung Ching-Wen
Outline • 1. Motivation • 2. Objective • 3. Introduction • 4. SVM • 5. Squashing for SVM • 6.EXPERIMENTS • 7. conclusion
Motivation • SVM provide classification model with strong theoretical foundation and excellent empirical performance. • But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.
Objective • This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.
Introduction • The applicability of SVMs to large datasets is limited ,because the high computational cost. • Speed-up training algorithms: • Chunking,Osuna’s decomposition method SMO • They can accelerate the training, but cannot scale well with the size of the training data.
Introduction • Reducing the computational cost : • Sampling • Boosting • Squashing(DuMouchel et. al.,Madigan et. al.) • 本文作者提出Squashing-SMO,以解決SVM的高計算成本問題
SVM • Training data:D={(xi,yi):i=1,…,N} • xi is a vector, yi=+1,-1 • In linear SVM :The linear separating classify y=<w,x>+b • w is the normal vector • b is the intercept of the hyperplane
Squashing for SVM • (1).Select a probabilistic model • P((X,Y) ∣θ) • (2).Our objective is to find mle θML
Squashing for SVM • (3). Training data:D={(xi,yi):i=1,…,N}can be grouped into Nc groups • (Xc,Yc)sq:The squashed data point placed at the cluster C • βc :the wieght
Squashing for SVM • If take the prior of w is • P(w) ~exp(-∥w∥2)
Squashing for SVM • (4).The optimization model for the squashed data:
Squashing for SVM • Important design issues for the squashing algorithm: • (1).the choice of the number and location of the squashing points • (2).to sample the values of w from the prior p(w) • (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of training point, and repeat the selection procedure L times(L is length)
EXPERIMENTS • experiment datasets: • Synthetic data • UCI machine learning • UCI KKD repositories
EXPERIMENTS • Evalute: • Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO • Run:over 100 runs • Performance: • Misclassification rate ,learning time ,the memory
EXPERIMENTS(Results on Synthetic data) • (Wf,bf):estimated by full-SMO • (Ws,bs): :estimated by squashed or sampled data
conclusion • 1.we describe how the use of squashing make the training of SVM applicable to large datasets. • 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory. • 3.srs-SMO has a higher misclassification rate. • 4.squash-SMO and boost-SMO can tune parameter in cross-validation ,it is impossible to full-SMO
conclusion • 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems. • 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.
opinion • It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets. • 我們可以根據資料性質來改變w的prior distribution, 例如指數分配,Log-normal,或用無母數方法去做