220 likes | 455 Views
Feature Selection for Regression Problems. M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece. Scope.
E N D
Feature Selection for Regression Problems M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece
Scope • To investigate the most suitable wrapper feature selection technique (if any) for some well known regression algorithms.
Contents • Introduction • Feature selection techniques • Wrapper algorithms • Experiments • Conclusions
Introduction • What is the feature subset selection problem? • Occurs prior to the learning (induction) algorithm • Selection of the relevant features (variables) that influence the prediction of the learning algorithm
Why feature selection is important? • May improve performance of learning algorithm • Learning algorithm may not scale up to the size of the full feature set either in sample or time • Allows us to better understand the domain • Cheaper to collect a reduced set of features
Characterising features • Generally, features are characterized as: • Relevant: These are features which have an influence on the output and their role can not be assumed by the rest • Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example. • Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).
Generates subset of features for evaluation • Can start with: • no features • all features • random subset of features Typical Feature Selection – First step 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Validation Yes 3 4
Typical Feature Selection – Second step Measures the goodness of the subset Compares with the previous best subset if found better, then replaces the previous best subset 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Validation Yes 3 4
Typical Feature Selection – Third step • Based on Generation Procedure: • Pre-defined number of features • Pre-defined number of iterations • Based on Evaluation Function: • whether addition or deletion of a feature does not produce a better subset • whether optimal subset based on some evaluation function isachieved 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Validation Yes 3 4
Typical Feature Selection - Fourth step 1 2 Original Feature Set Generation Subset Evaluation Basically not part of the feature selection process itself - compare results with already established results or results from competing feature selection methods Goodness of the subset Stopping Criterion No Validation Yes 3 4
Categorization of feature selection techniques • Feature selection methods are grouped into two broad groups: • Filter methods that take the set of data (features) attempting to trim some and then hand this new set of features to the learning algorithm • Wrapper methods that use as evaluation measure the accuracy of the learning algorithm
Argument for wrapper methods • The estimated accuracy of the learning algorithm is the best available heuristic for measuring the values of features. • Different learning algorithms may perform better with different feature sets, even if they are using the same training set.
Wrapper selection algorithms (1) • The simplest method is forward selection (FS). It starts with the empty set and greedily adds features one at a time (without backtracking). • Backward stepwise selection (BS) starts with all features in the feature set and greedily removes them one at a time (without backtracking).
Wrapper selection algorithms (2) • The Best First search starts with an empty set of features and generates all possible single feature expansions. The subset with the highest evaluation is chosen and is expanded in the same manner by adding single features (with backtracking). The Best First search (BFFS) can be combined with forward or backward selection (BFBS). • Genetic algorithm selection. A solution is typically a fixed length binary string representing a feature subset—the value of each position in the string represents the presence or absence of a particular feature. The algorithm is an iterative process where each successive generation is produced by applying genetic operators such as crossover and mutation to the members of the current generation.
Experiments • For the purpose of the present study, we used 4 well known learning algorithms (RepTree, M5rules, K*, SMOreg), the presented feature selection algorithms and 12 dataset from the UCI repository.
Methodology of experiments • The whole training set was divided into ten mutually exclusive and equal-sized subsets and for each subset the learner was trained on the union of all of the other subsets. • The best features are selected according to the feature selection algorithm and the performance of the subset is measured by how well it predicts the values of the test instances. • This cross validation procedure was run 10 times for each algorithm and the average value of the 10-cross validations was calculated.
Experiment with regression tree - RepTree BS is slightly better feature selection method (on the average) than the others for the RepTree.
Experiment with rule learner- M5rules BS, BFBS and GS are the best feature selection methods (on the average) for the M5rules learner.
Experiment with instance based learner - K* BS and BFBS is the best feature selection methods (on the average) for K* algorithm
Experiment with SMOreg Similar results from all feature selection methods
Conclusions • None of the described feature selection algorithms is superior to others in all data sets for a specific learning algorithm. • None of the described feature selection algorithms is superior to others in all data sets. • Backward selection strategies are very inefficient for large-scale datasets, which may have hundreds of original features. • Forward selection wrapper methods are less able to improve performance of a given classifier, but they are less expensive in terms of the computational effort and use fewer features for the induction. • Genetic selection typically requires a large number of evaluations to reach a minimum.
Future Work • We will use a light filter feature selection procedure as a preprocessing step in order to reduce the computational cost of the wrapping procedure without harming accuracy.