1 / 36

Feature selection methods

Feature selection methods. Isabelle Guyon isabelle@clopinet.com. IPAM summer school on Mathematics in Brain Imaging. July 2008. Feature Selection. n’.

tamera
Download Presentation

Feature selection methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature selection methods Isabelle Guyon isabelle@clopinet.com IPAM summer school on Mathematics in Brain Imaging. July 2008

  2. Feature Selection n’ • Thousands to millions of low level features: select the most relevant one to build better, faster, and easierto understand learning machines. n X m

  3. Applications examples Customer knowledge Quality control Market Analysis 106 OCR HWR Machine vision 105 Text Categorization 104 103 System diagnosis Bioinformatics 102 10 variables/features 10 102 103 104 105

  4. Nomenclature • Univariate method: considers one variable (feature) at a time. • Multivariate method: considers subsets of variables (features) together. • Filter method: ranks features or feature subsets independently of the predictor (classifier). • Wrapper method: uses a classifier to assess features or feature subsets.

  5. Univariate Filter Methods

  6. Univariate feature ranking m- m+ P(Xi|Y=1) P(Xi|Y=-1) -1 xi s- s+ • Normally distributed classes, equal variance s2 unknown; estimated from data as s2within. • Null hypothesis H0: m+ = m- • T statistic: If H0 is true, • t= (m+ - m-)/(swithin1/m++1/m-) Student(m++m--2 d.f.)

  7. Statistical tests ( chap. 2) pval r0 r Null distribution • H0: X and Y are independent. • Relevance index  test statistic. • Pvalue  false positive rate FPR = nfp / nirr • Multiple testing problem: use Bonferroni correction pval  n pval • False discovery rate: FDR = nfp / nscFPRn/nsc • Probe method: FPR  nsp/np

  8. Univariate Dependence • Independence: P(X, Y) = P(X) P(Y) • Measure of dependence: MI(X, Y) =  P(X,Y) log dX dY = KL( P(X,Y) || P(X)P(Y) ) P(X,Y) P(X)P(Y)

  9. Other criteria ( chap. 3) A choice of feature selection ranking methods depending on the nature of: • the variables and the target (binary, categorical, continuous) • the problem (dependencies between variables, linear/non-linear relationships between variables and target) • the available data (number of examples and number of variables, noise in data) • the available tabulated statistics.

  10. Multivariate Methods

  11. Univariate selection may fail Guyon-Elisseeff, JMLR 2004; Springer 2006

  12. Filters,Wrappers, andEmbedded methods Feature subset All features Filter Predictor Multiple Feature subsets All features Predictor Feature subset Embedded method All features Wrapper Predictor

  13. Relief Relief=<Dmiss/Dhit> nearest hit Dhit Dmiss nearest miss Kira and Rendell, 1992 Dhit Dmiss

  14. Wrappers for feature selection Kohavi-John, 1997 N features, 2N possible feature subsets!

  15. Search Strategies ( chap. 4) • Exhaustive search. • Simulated annealing, genetic algorithms. • Beam search: keep k best path at each step. • Greedy search: forward selection or backward elimination. • PTA(l,r): plus l , take away r – at each step, run SFS l times then SBS r times. • Floating search (SFFS and SBFS): One step of SFS (resp. SBS), then SBS (resp. SFS) as long as we find better subsets than those of the same size obtained so far. Any time, if a better subset of the same size was already found, switch abruptly. n-k g

  16. Forward Selection (wrapper) Start n n-1 n-2 1 … Also referred to as SFS: Sequential Forward Selection

  17. Forward Selection (embedded) Start n n-1 n-2 1 … Guided search: we do not consider alternative paths. Typical ex.: Gram-Schmidt orthog. and tree classifiers.

  18. Backward Elimination (wrapper) Also referred to as SBS: Sequential Backward Selection 1 n-2 n-1 n … Start

  19. Backward Elimination (embedded) Guided search: we do not consider alternative paths. Typical ex.: “recursive feature elimination” RFE-SVM. 1 n-2 n-1 n … Start

  20. Scaling Factors Idea: Transform a discrete space into a continuous space. s=[s1, s2, s3, s4] • Discrete indicators of feature presence:si{0, 1} • Continuous scaling factors:si IR Now we can do gradient descent!

  21. Formalism ( chap. 5) • Many learning algorithms are cast into a minimization of some regularized functional: Regularization capacity control Empirical error Justification of RFE and many other embedded methods. Next few slides: André Elisseeff

  22. Embedded method • Embedded methods are a good inspiration to design new feature selection techniques for your own algorithms: • Find a functional that represents your prior knowledge about what a good model is. • Add the s weights into the functional and make sure it’s either differentiable or you can perform a sensitivity analysis efficiently • Optimize alternatively according to a and s • Use early stopping (validation set) or your own stopping criterion to stop and select the subset of features • Embedded methods are therefore not too far from wrapper techniques and can be extended to multiclass, regression, etc…

  23. The l1 SVM • A version of SVM where W(w)=||w||2 is replaced by the l1 norm W(w)=i |wi| • Can be considered an embedded feature selection method: • Some weights will be drawn to zero (tend to remove redundant features) • Difference from the regular SVM where redundant features are included Bi et al 2003, Zhu et al, 2003

  24. The l0 SVM • Replace the regularizer ||w||2 by the l0 norm • Further replace by i log( + |wi|) • Boils down to the following multiplicative update algorithm: Weston et al, 2003

  25. Causality

  26. What can go wrong? Guyon-Aliferis-Elisseeff, 2007

  27. What can go wrong?

  28. What can go wrong? X2 Y Y X2 X1 X1 Guyon-Aliferis-Elisseeff, 2007

  29. http://clopinet.com/causality

  30. Wrapping up

  31. Bilevel optimization m1 m2 m3 N variables/features Split data into 3 sets: training, validation, and test set. 1) For each feature subset, train predictor on training data. 2) Select the feature subset, which performs best on validation data. • Repeat and average if you want to reduce variance (cross-validation). 3) Test on test data. M samples

  32. Complexity of Feature Selection With high probability: Generalization_errorValidation_error + e(C/m2) Error m2: number of validation examples, N: total number of features, n: feature subset size. n Try to keep C of the order of m2.

  33. CLOPhttp://clopinet.com/CLOP/ • CLOP=Challenge Learning Object Package. • Based on the Matlab® Spider package developed at the Max Planck Institute. • Two basic abstractions: • Data object • Model object • Typical script: • D = data(X,Y); % Data constructor • M = kridge; % Model constructor • [R, Mt] = train(M, D); % Train model=>Mt • Dt = data(Xt, Yt); % Test data constructor • Rt = test(Mt, Dt); % Test the model

  34. NIPS 2003 FS challenge http://clopinet.com/isabelle/Projects/ETH/Feature_Selection_w_CLOP.html

  35. Conclusion • Feature selection focuses on uncovering subsets of variables X1, X2, …predictive of the target Y. • Multivariate feature selection is in principle more powerful than univariate feature selection, but not always in practice. • Taking a closer look at the type of dependencies in terms of causal relationships may help refining the notion of variable relevance.

  36. Acknowledgements and references • Feature Extraction, • Foundations and Applications • I. Guyon et al, Eds. • Springer, 2006. • http://clopinet.com/fextract-book • 2) Causal feature selection • I. Guyon, C. Aliferis, A. Elisseeff • To appear in “Computational Methods of Feature Selection”, • Huan Liu and Hiroshi Motoda Eds., • Chapman and Hall/CRC Press, 2007. • http://clopinet.com/causality

More Related