Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen National Taiwan University of Science and Technology Expert Systems with Applications 2008

Introduction • The back-propagation network (BPN) can be used in various fields. -evaluating consumer loans -diagnosing heart disease • Different problems may require different parameter settings for network architectures. • Rule of thumb or ‘‘trial and error’’ methods are usually used to determine them.

Introduction • Not all features are beneficial for classification in BPN. • Select the beneficial subset of features which result in a better classification. • Simulated-annealing (SA) -based approach, to obtain the optimal parameter settings for network architectures of BPN.

BPN • Before applying BPN to solve problems as follow: (1) the parameter settings for network architectures (2) hidden layer number (3) learning rate (4) momentum term (5) number of hidden neurons (6) learning cycle

Feature Selection • The main benefits of feature selection are as follows: (1) Reducing computational cost and storage requirements (2) Dealing with the degradation of classification efficiency due to the finiteness of training sample sets (3) Reducing training and prediction time (4) Facilitating data understanding and visualization

Problems • While using BPN, we confront two problems: • How to set the best parameters for BPN ! • How to choose the input attributes for BPN ! • SA-based approach that not only provided the best parameter settings for network architecture of BPN, but also found out the beneficial subset of features according to different problems.

BPN • BPN is a common neural network model whose architecture is the multilayer perceptorns (MLP).

Learning rate of BPN • Learning rate: 1.Too high a learning rate will cause the network architecture to oscillate and be hard to converge. 2.Too low a learning rate will cause slow convergence and may fall into local optimization.

Momentum term of BPN • Momentum term: 1.Too small a momentum term does not have an obvious effect and cannot increase the classification accuracy rate 2.Too big a momentum term can excessively affect the learning effect and cause extreme modification.

Number of hidden neurons of BPN • (3)Number of hidden neurons: 1.When there are too few hidden neurons, it is apt to cause a bigger error 2.Increasing the number of hidden neurons can affect the speeds of convergence and computing with almost no help in reducing errors

Learning cycle of BPN • (4) Learning cycle: 1.Too high a learning cycle will result in over-fitting 2.Too low a learning cycle can lead to too little training and result in a worse classification accuracy rate of testing data.

Some Solution • Search for the optimal weights after training • Search for the optimal parameter settings of BPN • Neural network pruning

Proposed by Kirkpatrick (1985) Pick a random assignment Make a small change Accept change if cost is decreased; or Other criteria First used by Kakuno et. al. Simulated Annealing

Simulated-annealing

Initial random assignment Make a small change No Accept? Yes Update current solution Yes temperature dropping temperature dropped No No Termination? Yes Optimized Simulated-annealing

Solution representation • First variable is the learning rate • Second is the momentum term • Third is the number of hidden neurons • Other is represented as feature selection

Parameters range • SA was set to 300 to find the optimal BPN parameter settings • The learning rate ranged from 0 to 0.45 • The momentum term ranged from 0.4 to 0.9 • The learning cycle of BPN was set as 500

Platform • Using the C language • Windows XP operating system • Pentium IV 3.0 GHz CPU • 512 MB of RAM.

Cross-Validation • To guarantee that the present results are valid and can be generalized for making predictions regarding new data • Using k-fold-cross-validation • This study used k = 10, meaning that all of the data will be divided into ten parts, each of which will take turns at being the testing data set.

Datasets

System architecture

10-fold classification result of Breast Cancer dataset

The comparison results of approaches without feature selection

SA + BPN approach with feature selection and other approaches

Experimental results summary of with/without feature selection on datasets

Concusion • We proposed a SA-based strategy to select features subset and to set the parameters for BPN classification. • Compared to the previous studies, the classification accuracy rates of the proposed SA + BPN approach are better than those of other approaches.

Thank YouQ & A

Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

Presentation Transcript

Li Shih-chen 1518 -1593

6A0C0016 Erica Shih

Shih tzu

Advisor : R u -Li Lin Advisee :Shih-Min Chen

Shih T zu

Fen Shih Feb, 2011

Joanne Shih

Advisor: Yeong-Sung Lin Presented by I-Ju Shih

Tsung-wei Tu , Hung- yi Lee , Lin- shan Lee ASRU2011 Presenter: Peining Chen

Te-Chi Kuo Cheng-Yuan Huang Yu-Te Hung Min-Min Chen Po-Chieh Shih Date: April 24, 2012

Advisor: Yeong -Sung Lin Presented by I- Ju Shih

Advisor: Yeong -Sung Lin Presented by I- Ju Shih

Shih-Cheng Yeh, Shao-Tun Chung, Chih-Chieh Kang, and Jeng-Feng Lin

RCAS, Academia Sinica Shih-Yen Lin ( 林時彥 )

Presenter : Shih -Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo

Shih- tzu Dog

Wu,Shuo -Yan

Shih Tzus

Shih-Wei Lin, Tsung-Yuan Tseng, Shuo-Yan Chou, Shih-Chieh Chen

Shih T zu

Imperial shih tzu

Te-Chi Kuo Cheng-Yuan Huang Yu-Te Hung Min-Min Chen Po-Chieh Shih Date: April 24, 2012