On the Stability of Feature Selection in the Presence of Correlations

Slides at: tinyurl.com/ecml2019stability On the Stabilityof Feature Selectionin the Presence of Correlations Konstantinos Sechidis, Konstantinos Papangelou, Sarah Nogueira, James Weatherall and Gavin Brown

Your Data Science Pipeline Predictions Conclusions Investments How much do you trustyour feature choices? How reproducible is your research result?

How much do you trustyour choices? x1, x2, x3, x4, x5, x6, x7, …, etc,…, x499, x500 Your Data Science Pipeline FeatureSelection (any method) x1, x3, x5, x6, x493

How much do you trustyour choices? x1, x2, x3, x4, x5, x6, x7, …, etc,…, x499, x500 FeatureSelection Cross-validate Your Data Science Pipeline x1, x3, x5, x6, x493

How much do you trustyour choices? x1, x2, x3, x4, x5, x6, x7, …, etc,…, x499, x500 Drop random1% of examples Will not make a difference …or will it? FeatureSelection Cross-validate Your Data Science Pipeline x491 x2 x1, x3, x5, x6, x493 “Stability”

Stability of Feature Selection z1 z2 z3 z4 z5 z6 z7 …… z493 … [ x1, x3, x5, x6, x493 ] “Selection vector” [ 1 0 1 0 11 0 0 ….. 1…] ECML 2016 tinyurl.com/ecml2016stability (a few differences ECML -> JMLR) JMLR 2018 Nogueira et al, “On The Stability of Feature Selection” “the change in the selected feature subset caused by tiny changes in the training data”

Stability Set intersection (i.e. num features in common) z1 z2 z3 z4 z5 z6 z7 …… z493 … [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] [ 1 0 0 0 0 0 0 0 ….. 1…] [ 1 0 1 0 111 1 ….. 1…] [ 0 1 0 1 0 0 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 0 0 1 0 0 ….. 1…] s1 s2 s3 s4 . . . sM Repeat M times - I perturb my data - Select features. e.g. Kalousis 2005, Kuncheva 2007, Lustgarden 2008, etc

Stability Set intersection (i.e. num features in common) z1 z2 z3 z4 z5 z6 z7 …… z493 … [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] [ 1 0 0 0 0 0 0 0 ….. 0…] [ 1 0 1 0 111 1 ….. 1…] [ 0 1 0 1 0 0 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 0 0 1 0 0 ….. 1…] s1 s2 s3 s4 . . . sM Repeat M times - I perturb my data - Select features. e.g. Kalousis 2005, Kuncheva 2007, Lustgarden 2008, etc

Nogueira et al, “On The Stability of Feature Selection” Journal of Machine Learning Research 2018 Stability z1 z2 z3 z4 z5 z6 z7 …… z493 … Probability of selecting f [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] [ 1 0 1 0 0 0 0 0 ….. 1…] [ 1 0 1 0 111 1 ….. 1…] [ 0 1 0 1 0 0 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 0 0 1 0 0 ….. 1…] s1 s2 s3 s4 . . . sM Average num selected Repeat M times - I perturb my data - Select features. Total number of features

Nogueira et al, “On The Stability of Feature Selection” Journal of Machine Learning Research 2018 Stability z1 z2 z3 z4 z5 z6 z7 …… z493 … [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] = 1.0 … assuming large sample size M … usually M=50 is sufficient. Constant processes give stabilityONE.

Nogueira et al, “On The Stability of Feature Selection” Journal of Machine Learning Research 2018 Stability z1 z2 z3 z4 z5 z6 z7 …… z493 … [ 0 0 1 0 11 0 0 ….. 1…] [ 1 0 0 1 0 0 0 1 ….. 0…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 0 0 1 0 0 0 0 ….. 1…] [ 1 0 1 0 0 1 0 1 ….. 0…] [ 1 0 0 0 0 0 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 0…] = 0.0 … assuming large sample size M … usually M=50 is sufficient. Random processes give stabilityZERO

Nogueira et al, “On The Stability of Feature Selection” Journal of Machine Learning Research 2018 Stability z1 z2 z3 z4 z5 z6 z7 …… z493 … [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] [ 1 0 1 0 11 0 0 ….. 1…] [ 0 1 0 111 0 0 ….. 1…] It’s alternating…. = 0.26 Not very stable? But if I tell you…. - feature x1 is highly correlated with x2 ? - feature x3 is measuring the same thing as x4 ?

On the Stabilityof Feature Selectionin the Presence of Correlations • This paper proves : • stability measures (any) will systematically underestimate in the presence of correlations leading to an overly pessimistic stability. • This paper provides : • a correction to incorporate domain knowledge, on feature equivalencies and/or correlations, givingtheeffective stability.

On the Stabilityof Feature Selectionin the Presence of Correlations

On the Stabilityof Feature Selectionin the Presence of Correlations z1 z2 z3 z4 z5 z6 z7 …… z493 … s1 s2 [ 1 0 0 0 11 0 0 ….. 1…] [ 0 1 0 0 11 0 0 ….. 1…] Intersection Effective Intersection Coded as binary matrix Domain Knowledge… Feature x1 and x2 are the same thing. where if features are to be treated as the same. or partial correlation, e.g.

On the Stabilityof Feature Selectionin the Presence of Correlations Proposed by LASSO [ 1 0 1 0 11 0 0 ….. 1…] 1.0 LASSO Proposed by Mut Info [ 0 0 11 0 1 0 0 ….. 0…] Accuracy (3-nn) MIM 0.0 0.0 1.0 Stability Accuracy/Stability is a trade-off. “Explicit Control of Feature Relevance and Selection Stability Through Pareto Optimality” IAL workshop, ECML 2019 , Victor Hamer, Pierre Dupont

On the Stabilityof Feature Selectionin the Presence of Correlations 1.0 1.0 LASSO LASSO Accuracy (3-nn) MIM MIM 0.0 0.0 0.0 0.0 1.0 1.0 Effective Stability Stability

On the Stabilityof Feature Selectionin the Presence of Correlations Stability Effective Stability Accuracy vs Stability: Pareto-optimality Effective Stability identifies a solution as more stable than expected.

On the Stabilityof Feature Selectionin the Presence of Correlations Accuracy vs Stability: Pareto-optimality Effective stability alters the ‘optimal’ choice of feature set in 7/10datasets.

On the Stabilityof Feature Selectionin the Presence of Correlations Empirical Study: Stability of Biomarker Selection Efficacy of gefitinib vs chemotherapy for lung cancer

On the Stabilityof Feature Selectionin the Presence of Correlations All EGFR gene mutations (known to play a role in NSCLC) Measurewithin-group stability to see what’s happening… Changes our view of the “best” algorithm to invest in.

Conclusions A simple closed form estimator for the effective stability Incorporating domain knowledge on feature correlations and equivalences. Empirically demonstrated on biomarker identification tasks, allows measurement of trust in in data science pipelines.

Your Data Science Pipeline Predictions Conclusions Investments How much do you trustyour data science pipeline? How reproducible / defendable are your decisions?

On the Stabilityof Feature Selectionin the Presence of Correlations Konstantinos Sechidis, Konstantinos Papangelou, Sarah Nogueira, James Weatherall and Gavin Brown A simple closed form estimator for the effective stability Incorporating domain knowledge on feature correlations and equivalences. Empirically demonstrated on biomarker identification tasks, allows measurement of trust in in data science pipelines.

On the Stability of Feature Selection in the Presence of Correlations

On the Stability of Feature Selection in the Presence of Correlations

Presentation Transcript

Feature selection

STABILITY ANALYSIS IN PRESENCE OF WATER

Feature Selection

Feature selection

Feature Selection

Hubness in the Context of Feature Selection and Generation

Feature Selection

Feature Selection

Feature selection

Feature Selection

Feature Selection

Feature Selection

On the Stability of Rational, Heterogeneous Interdomain Route Selection

The Operation of Heavy Equipment in the Presence of Personnel On the Ground

Feature selection

Presence of News on the Internet

A Survey on Classification of Feature Selection Strategies

Feature Selection

Feature Selection

In the Presence of God

Theoretical study of the phase evolution in a quantum dot in the presence of Kondo correlations

Feature selection