Feature Selection using Mutual Information

Feature Selectionusing Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

Outline • Introduction … prostate cancer project • Definition of ROI and Features • Estimation of PDFs … using Parzen Density Estimation • Feature Selection … using MI Based Feature Selection • Evaluation of Selection … using Generalized Divergence • Conclusions

Ultrasound Image of Prostate

Prostate Outline

“Guesstimated” Cancerous Region

Regions of Interest (ROI)

X 0 Features as Mapping Functions • Mapping from image space to feature space…

Histogram Bins bad estimation with limited data available! Parzen Density Est. reasonable approximation with limited data. X X X 0 0 0 Parzen Density Estimation

Gray-Level Difference Matrix (GLDM) Contrast Mean Entropy Inverse Difference Moment (IDM) Angular Second Moment (ASM) Fractal Dimension FD Linearized PowerSpectrum Slope Y-Intercept Features

P(X|C=Cancerous),P(X|C=Benign), and P(X)

Entropy and Mutual Information • Mutual Information I(C;X) measures the degree of interdependence between X and C. • Entropy H(C) measures the degree of uncertainty of C. • I(X;C) = H(C) – H(C|X). • I(X;C) ≤ H(C) is the upper bound.

Results:Mutual Information I(C;X)

Feature Images - GLDM

Feature Images – Fractal Dim.

Feature Images - PSD

Interdependence between Features • Expensive to compute all features. • Some features might be similar to each other. • Thus, need to measure the interdependence between features: I(Xi; Xj)

Results:Interdependence between Features

Mutual Information BasedFeature Selection (MIFS) • Select first feature with highest I(C;X). • Select next feature with highest: • Repeat until a desired number of features are selected.

Mutual Information BasedFeature Selection (MIFS) • This method takes into account both: • the interdependence between class and features, and • the interdependence between selected features. • The parameter β controls the amount of interdependence between selected features.

{X1, X2, X3,…, X8} β = 1 β = 0 β = 0.5 S = {X2, X7} S = {X2, X4} S = {X2, X3} Varying β in MIFS

Generalized Divergence J • If the features are “biased” towards a class, J is large. • A good set of features should have small J.

Results:J with respect to β • First feature selected: GLDM ASM • Second feature selected: …

X1 minimize X2 maximize C XN {X1, X2, X3,…, X8} β = 1 β = 0 β = 0.5 S = {X2, X7} S = {X2, X4} S = {X2, X3} Conclusions • Mutual Info. Based Feature Selection (MIFS): • Generalized Divergence:

Questions and Comments • …

Feature Selection using Mutual Information

Feature Selection using Mutual Information

Presentation Transcript

Feature selection

Robust Feature Selection by Mutual Information Distributions

Feature Selection as Relevant Information Encoding

Online Feature Selection for Information Retrieval

Feature Selection

Feature selection

Feature Selection

Feature Selection

Feature Selection

FEATURE SELECTION = GENE SELECTION

Feature selection

Feature Selection

Feature Selection with Mutual Information and Resampling

Mutual Information for Image Registration and Feature Selection

Feature Selection

Feature Selection, Feature Extraction

Feature Selection

Feature selection

Feature Selection

Feature Selection

Feature selection

Image Registration Using Mutual Information