Filter methods

Data Mining 2, 2006

Filter methods • T statistic • Information • Distance • Correlation • Separability … Data Mining 2, 2006

FSS Algorithms • Exponential • Exhaustiva search • Branch & Bound • Beam search • Sequential • SFS and SBS • Plus-l / Minus-r • Bidirectional • Floating (exponential in worst case) • Randomized • Sequential + randomness • Genetic Data Mining 2, 2006

SFS performs best when the optimal subset has a small number of features When the search is near the empty set, a large number of states can be potentially evaluated Towards the full set, the region examined by SFS is narrower since most of the features have already been selected Data Mining 2, 2006

Example The optimal feature subset turns out to be {x1, x4}, because x4 provides the only information that x1 needs: discrimination between classes ω4 and ω5 Data Mining 2, 2006

SBS works best when the optimal feature subset has a large number of features, since it spends most of its time visiting large subsets The main limitation of SBS is its inability to reevaluate the usefulness of a feature after it has been discarded Data Mining 2, 2006

SFS is performed from the empty set SBS is performed from the full set Features selected by SFS are not removed by SBS Features removed by SBS are not selected by SFS Guarantee convergence Data Mining 2, 2006

Some backtracking ability Main limitation is that there is no theoretical way of predicting the optimal values of l and r Data Mining 2, 2006

Monotonicity in J: Data Mining 2, 2006

B A=J({1,2}) • The value of A is updated when a greater one is found in a leaf • Stop whenever every leaf has been purged or evaluated A >= B  purge Data Mining 2, 2006

With a proper queue size, Beam Search can avoid getting trapped in local minimal by preserving solutions from varying regions in the search space The optimal is 2-3-4 (J=9), which is never explored Data Mining 2, 2006

The RELIEF algorithm (1) Data Mining 2, 2006

The RELIEF algorithm (2) Data Mining 2, 2006

Conclusions • Dificult and pervasive problem! • Lack of accepted and useful definitions for relevance, redundancy and irrelevance • Nesting problem • Abundance of algorithms and filters • Lack of proper comparative benchmarks • Obligatory step, usually well worth the pain Data Mining 2, 2006

Filter methods

Filter methods

Presentation Transcript

Filter quality “Q” and Filter types

Baghouse Filter

Ensemble Kalman Filter Methods

Lowpass Filter and Highpass Filter

Filter Types

Filter Rating

Multiscale Filter Methods Applied to GRACE and Hydrological Data

The Kalman filter - and other methods

Filter Gauge

Trickling Filter

Thresholding Filter

ACTIVE FILTER

Professional filter bags manufacturer-Hualv Filter

Sanitary filter cartridge housing – Better Filter

koffiezetapparaat filter

WATER FILTER

Filter Coffee Recipe Without Coffee Filter

Teflon Cartridge Filter - East Coast Filter

Leading Candle Filter Manufacturer - SAP Filter

Best Pool Filter Cleaning Methods for 2024