230 likes | 356 Views
Feature transformation through rule induction: A case study with the k -NN classifier. Antal van den Bosch Tilburg University, The Netherlands http://ilk.uvt.nl - Antal.vdnBosch@uvt.nl. Outline. General idea Feature space transform for k -NN k -NN classification over rules
E N D
Feature transformation through rule induction: A case study with the k-NN classifier Antal van den Bosch Tilburg University, The Netherlands http://ilk.uvt.nl - Antal.vdnBosch@uvt.nl
Outline • General idea • Feature space transform for k-NN • k-NN classification over rules • An implementation using RIPPER • Intermezzo - parameter optimization • Experiments on UCI data • Conclusions
A C F Z B B D Y B C C X f1 f2 f3 c r1 r2 r3 r4 c B B C ? 0 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 Z Y X ? If f1=A then c=Z If f1=B and f2=B then c=Y If f2=C then c=X If f3=C then c=X Feature transformation
k-NN over rules • Different classification: • No “class of first rule that matches” • Instead, produce • majority class of nearest neighbors • that share the most matching rules with the new instance (weighted, …) • Different outcomes possible • Rule’s class is not considered, only NN’s • Rules become features with weights; can outweigh and outnumber others
Related work • Sébag and Schoenauer (1994) • same transformation, but for local regression; interesting dimension reduction • Generalizing instances to rules in k-NN • Salzberg (1990), NGE (hyperrectangles) • Domingos (1996), RISE (merging with wildcards) • Van den Bosch (1999), FAMBL (merging by disjuncting values) • Van den Bosch (2000) • earlier version only on natural language processing tasks
Implementation • RIPPER (Cohen, 1995) • Sequential covering, MDL-driven • Induces sets of rules per class • Uses partitioning to validate and select rules • Many heuristics, many parameters, fast • Procedure: • Apply RIPPER to training set • Recode training and test set using RIPPER rules • Train and test k-NN (IB1 in TiMBL 5.0)
Variants • Transformed IB1 (T-IB1) • new features replace original • IB1 plus new features (IB1+T) • new features are added to original • Compared against RIPPER and IB1 • 10-fold CV • Unpaired one-tailed t-tests
UCI data sets • Artificial data sets • Fully known underlying concept • Known conditional dependencies • Natural data sets • Partly understood underlying problem • Unknown conditional dependencies
Intermezzo • Parameter settings matter, but • Good setting is unpredictable • Parameters interact • Exhaustive wrapping is not an option • Both k-NN (TiMBL) and RIPPER have lots of parameters • Heuristic: Wrapped progressive sampling (Van den Bosch, 2004)
WPS parameter spaces • Ripper: • F (min. # inst/r) • a (class order) • n (negation) • S (simplify) • O (# opt. passes) • L (loss ratio) • 648 combinations • IB1 (TiMBL): • k (k-NN) • w (feature wght) • m (sim. metric) • d (distance wght) • L (metric backoff) • 925 combinations
Discussion • T-IB1 ≈ RIPPER • RIPPER classification can be interchanged with k-NN classification • IB1+T outperforms IB1 and RIPPER • Extra features add useful new views on task • Effects mainly on artificial data • Complex “game” rules help k-NN in finding the correct nearest neighbors
One example: tic-tac-toe • Tic-tac-toe • donated by David Aha to UCI repository • 958 possible endings of 3x3 board game • class: whether board constitutes a win for X • Yes or no (no can be a win for O, or a draw) • Typical 100% correct eight rule set: • Check 2 diagonals • Check 3 horizontals • Check 3 verticals for consecutive Xs • Usually RIPPER finds these eight, but sometimes induces other rules
X O O O O X O O O X O O X X X X X O Tic-tac-toe freak rule • Test on O in three locations, not three in a row • X may win • But may mean draw!
X O O O X X X O IB1+T saves the day • Finds a nearest neighbor • Mismatches in two positions • Matches on the rule feature • Represents a draw X O O O X X X O
Future work • More relaxed and redundant rule inducer (more rules per instance) • Bigger context: plug k-NN onto RIPPER, maxent, SVM, Winnow, … • AUC instead of accuracy