Correlation Aware Feature Selection

Correlation Aware Feature Selection Annalisa BarlaCesare FurlanelloGiuseppe JurmanStefano MerlerSilvano Paoli http://mpa.itc.it Berlin – 8/10/2005

Overview • On Feature Selection • Correlation Aware Ranking • Synthetic Example

Feature Selection Step-wise variable selection: One feature vs. N features n*<N effective variables modeling the classification function N features … … Step 1 Step N N steps

Feature Selection Step-wise selection of the features. Ranked Features Discarded Features Steps

Ranking • Classifier independent filters • Prefiltering is risky: you might discard features that turns out to be important. (ignoring labelling) • Induced by a classifier

Support Vector Machines Classification function: Optimal Separating Hyperplane

The classification/ranking machine • The RFE idea: given N features (genes) • Train a SVM • Compute a cost function J from the weight coefficients of the the SVM • Rank features in terms of contribution to J • Discard the feature less contributing to J • Reapply procedure on the N-1 features This is called Recursive Feature Elimination (RFE) Features are ranked according to their contribute to the classification, given the training data. Time and data consuming, and at risk of selection bias Guyon et al. 2002

RFE-based Methods Considering chunks of data at a time: • Parametrics • Sqrt(N) – RFE • Bisection – RFE • Non-Parametrics • E – RFE (adapting to weight distribution):thresholding weights to a value w*

Variable Elimination Correlated genes Given F={x1, x2, …, xH}such that: for a given threshold T. Each single weight is negligible w(x1)~w(x2) ~ … ~ ε < w* BUT w(x1)+w(x2)+ … >> w*

Correlated Genes (1)

Correlated Genes (2)

N(1,1) 1 feat repeated Unif(-4,4) N(-1,1) Synthetic Data Binary problem 100 (50 +50) samples of 1000 genes: genes 150 : randomly extracted from N(1,1) and N(-1,1) respectively genes 50100 : randomly extracted from N(1,1) and N(-1,1) respectively (1 repeated 50 times) genes 101 1000 extracted from UNIF(-4,4) 1 1000 50 100 Class 1: 50 51 significantfeatures Class 2: 50 50 1x50

Our algorithm step j

Methodology • Implemented within the BioDCV system (50 replicates) • Realized through R - C code interaction

Synthetic Data 1 100 1000 50 steps Gene 100 is consistently ranked as 2nd

Work in Progress • Preservation of high correlated genes with low initial weights on microarrays datasets • Robust correlation measures • Different techniques to detect Fl families (clustering, gene functions)

Synthetic Data

Synthetic Data 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 227 559 864 470 363 735 Features discarded at step 9 from E-RFE procedure: Correlation Correction: Saves feature 100

Challenges Challenges for predictive profiling • INFRASTRUCTURE • MPACluster -> available for batch jobs • Connecting with IFOM -> 2005 • Running at IFOM -> 2005/2006 • Production on GRID resources (spring 2005) • ALGORITHMS II • Gene list fusion: suite of algebraic/statistical methods • Prediction over multi-platform gene expression datasets (sarcoma, breast cancer): large scale semi-supervised analysis • New SVM Kernels for prediction on spectrometry data within complete validation

A few issues in feature selectionwith a particular interest on classificationof genomic data WHY? To enhance information To ease computational burden Discard the (apparently) less significant features and train in a simplified space: alleviate the curse of dimensionality Highlight (and rank) the most important features and improve the knowledge of the underlying process. HOW? As a pre-processing step As a learning step Link the feature ranking to the classification task: wrapper methods, … Employ a statistical filter (t-test, S2N)

Prefiltering is risky: you might discard features that turns out to be important. Nevertheless, wrapper methods are quite costing. Moreover, in the gene expression data, you have to deal also with particular situations like clones or highly correlated features that may represent a pitfall for several selection methods. A classic alternative is to map into linear combination of features,and then select. Principal Component Analysis Metagenes (a simplified model for pathways: but biological suggestions require caution) eigen-craters for unexploded bomb risk maps But we are not working anymore with the original features.

Feature Selection within Complete Validation Experimental Setups Complete Validation is needed to decouple model tuning from (ensemble) model accuracy estimation: otherwise selection bias effects … Accumulating rel. importance from Random Forest models for the identification of sensory drivers(with P. Granitto, IASMA)

Correlation Aware Feature Selection

Correlation Aware Feature Selection

Presentation Transcript

Feature selection

Feature Selection

Feature selection

Feature Selection

Feature selection methods

Feature Selection and Extraction

Feature Selection

Feature Selection

FEATURE SELECTION = GENE SELECTION

Feature selection

Feature Selection

Example: Feature selection

Feature Selection

Feature Selection, Feature Extraction

Feature Selection

Feature selection

Feature Selection

Feature Selection

Feature selection

Example: Feature selection

Feature Selection Methods