Feature Selection

Feature Selection Benjamin Biesinger - Manuel Maly - Patrick Zwickl

Agenda • Introduction: What is feature selection? What is our contribution? • Phases: What is the sequence of actions in our solution? • Solution: How does it work in particular? • Results: What is returned? • Analysis: What to do with it? What can we conclude from it?

Introduction • Not all features of a data set are useful for classification • A large number of attributes negatively influences the computation time • The most essential features should be used for classification • Feature selection is an approach • Different search strategies and evaluations are available, but which is the best? • Automatic feature selection: Several algorithms are run, compared and analyzed for trends → Implemented by us

Phases • Phases: (I) Meta-classification - (II) Classification • Before: File loading & preparation • Afterwards: Comparison + output generation

Solution • Java command-line application utilizing the WEKA toolkit • Command-line arguments: Filename (of dataset), Classifier algorithm name, Split (feature selection <-> classification percentage) • Example: „winequality-red.csv M5Rules 20“ • Computation of results and display in system output of console

Solution (Flow 1) • Parsing of dataset and creation of WEKA-specific „Instances“ object. • Split of Instances object in two parts, depending on percentage entered by user. • Combining all evaluation and search algorithms given in properties-files, and applying on 1. Instances object, finally storing results in dedicated objects (SData). • Classifying all combinations from step 3 with classifier entered by user on 2. Instances object. Again storing results in SData objects.

Solution (Flow 2) • Gaining aggregate information on all results by iterating over SData objects. • Print trend analysis and information on combined evaluation and search algorithms, plus the corresponding classification results (time + mean absolute error).

Solution (Output Excerpt) @TREND of selected features Attribute: bottom-right-square has Count: 8 … =============== Evaluation: ConsistencySubsetEval =============== --- Search: GreedyStepwise --- # of selected features: 1, selection time: 34, classification time: 36, mean abs. error:47,07% # of selected features: 2, selection time: 35, classification time: 34, mean abs. error:43,16% … --- Search: RandomSearch --- Automatic feature number (no influence by user): 5, selection time: 74, classification time: 118, mean abs. error:44,46%

Results • Tested on 3 different datasets • Tic Tac Toe • Wine Quality (red) • Balance Scale • 2 comparisons per dataset were made • For each feature selection individually • Between different feature selection techniques • Is there a trend which features are selected by most techniques?

1st Comparison • Influence of number of selected features on • Runtime • Classification accuracy (measured in MAE)

1st Comparison Result • Only those search algorithms used that implement RankedOutputSearch interface • Capable to influence the number of features to select • Number of features selected and MAE behave to each other directly proportional – to runtime inversely proportional

2nd Comparison • Feature Selection Technique consists of • Search algorithm • Evaluation algorithm • Not all combinations possible! • Different feature selection techniques compared to each other concerning: • Runtime • Performance (measured in MAE)

2nd Comparison Result • Different techniques select different amount of attributes • In some extent, different attributes, too • Some techniques are slower than others • Huge runtime differences between search algorithms • Some techniques select insufficient attributes to give acceptable results

Trend • In all tested datasets there was a trend on which features were selected • Higher count of selection implies bigger influence to the output

Analysis • Different feature selection techniques – different characteristics • ClassifierSubsetEval / RaceSearch very good classification results • Less attributes – faster classification • Algorithms that select less features are faster • e.g. GeneticSearch

Lowest error rate

Lowest runtime

Trend

Feature Selection hääh? Benjamin Biesinger - Manuel Maly - Patrick Zwickl Anything missed? Any questions? The essential features ;) thx

Feature Selection