A distributed PSO – SVM hybrid system with feature selection and parameter optimization

A distributed PSO–SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008

Introduction • Hybridizing the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a small and appropriate feature subset. • Combining the discrete PSO with the continuous-valued PSO • Implementing via a distributed architecture using the web service technology to reduce the computational time.

Introduction • The continuous-valued version is used to optimize the best SVM model parameters. • The discrete version is used to search the optimal feature subset. • PSO can be easily adopted for parallel processing by distributed system.

Support Vector Machine • Kernel Function: RBF (C and Gamma ) • Multi-class strategies: one-against-one (adapt in this study) one-against-all

Particle swarm optimization • Rnd( ) is a random function in the range[0, 1] • Positive constant c1 and c2 are personal and social learning factors. • w is the inertia weight and Inertia weight balances the global exploration and local exploitation. • Pi,d denote the best previous position encountered by the ith particle. • Pg,d denotes the global best position thus far. • t denotes the iteration counter.

Particle swarm optimization • The new position of a particle is calculated using the following formula:

Binary PSO • The function S(v) is a sigmoid limiting transformation and rnd( ) is a random number selected from a uniform distribution in [0, 1].

Particle representation • Features mask (discrete-valued) • C (continuous-valued) • Gamma (continuous-valued)

Fitness definition • WA: SVM classification accuracy weight • acci: SVM classification accuracy • WF: weight of the features • f j :the value of feature mask-‘‘1’’represents that feature j is selected and ‘‘0’’ represents that feature j is not selected. • nF : the total number of features.

Strategies for setting the inertia weight

Data descriptions • There are eight target classes that need to be classified in this data set. • The data set has 30 features that only five of them (f5, f10, f15, f20, and f25) are relevant to the eight classes.

Experimental procedures • Randomly split the data into ten groups using stratified 10-fold cross validation. • Each group contains training, validation and test sets. • The training set is used to build the SVM model. • The validation set is used to determine the proper training iteration to avoid overtraining • The test set is used to evaluate the model’s classification accuracy.

Setting of the system parameters

Experimental procedures

Experimental results

Experimental results • HITF : the number of hits on correct features. • COVERF : the number of times the selected feature subset covered the correct features. • RATIOF : the ratio of correct features for the ten experiments (10-fold CV).

Experimental results • f : denote the selected feature subset by the PSO. • F : denote correct discriminating features (f5, f10, f15, f20,and f25 in this experiment),

Experimental results

Fitness

Distributed architectures

CPU Time

Conclusions • Input feature subset selection and the kernel parameters setting are crucial problems. • This study proposed a new hybrid PSO–SVM system to solve these two problems. • To overcome the long training time when dealing with a large-scale dataset, the PSO–SVM can be implemented with a distributed parallel architecture.

Thank You

A distributed PSO – SVM hybrid system with feature selection and parameter optimization

A distributed PSO – SVM hybrid system with feature selection and parameter optimization

Presentation Transcript

AnyGL: A Large Scale Hybrid Distributed Graphics System

Hybrid cars – the value of patent strategies in innovation

Hybrid Soft Computing: Where Are We Going?

Data Mining: Preprocessing Techniques

Multi-objective Optimization Using Particle Swarm Optimization

Feature selection methods

FEASIBILITY STUDY OF HYBRID WOOD STEEL STRUCTURES

The Yeast Two-Hybrid System

Distributed System Design: An Overview*

Chapter 23

Basic Optimization Training

Distributed File Systems

Naming

Chapter 22: Distributed Databases

A Tutorial on Bayesian Speech Feature Enhancement

Operating System Security

Reliable Distributed Systems

Outline

TECO Servo Drives JSDA Series parameter description

Outline

Distributed Query Processing