280 likes | 565 Views
Feature Selection using Mutual Information. SYDE 676 Course Project Eric Hui November 28, 2002. Outline. Introduction … prostate cancer project Definition of ROI and Features Estimation of PDFs … using Parzen Density Estimation Feature Selection … using MI Based Feature Selection
E N D
Feature Selectionusing Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002
Outline • Introduction … prostate cancer project • Definition of ROI and Features • Estimation of PDFs … using Parzen Density Estimation • Feature Selection … using MI Based Feature Selection • Evaluation of Selection … using Generalized Divergence • Conclusions
X 0 Features as Mapping Functions • Mapping from image space to feature space…
Histogram Bins bad estimation with limited data available! Parzen Density Est. reasonable approximation with limited data. X X X 0 0 0 Parzen Density Estimation
Gray-Level Difference Matrix (GLDM) Contrast Mean Entropy Inverse Difference Moment (IDM) Angular Second Moment (ASM) Fractal Dimension FD Linearized PowerSpectrum Slope Y-Intercept Features
Entropy and Mutual Information • Mutual Information I(C;X) measures the degree of interdependence between X and C. • Entropy H(C) measures the degree of uncertainty of C. • I(X;C) = H(C) – H(C|X). • I(X;C) ≤ H(C) is the upper bound.
Interdependence between Features • Expensive to compute all features. • Some features might be similar to each other. • Thus, need to measure the interdependence between features: I(Xi; Xj)
Mutual Information BasedFeature Selection (MIFS) • Select first feature with highest I(C;X). • Select next feature with highest: • Repeat until a desired number of features are selected.
Mutual Information BasedFeature Selection (MIFS) • This method takes into account both: • the interdependence between class and features, and • the interdependence between selected features. • The parameter β controls the amount of interdependence between selected features.
{X1, X2, X3,…, X8} β = 1 β = 0 β = 0.5 S = {X2, X7} S = {X2, X4} S = {X2, X3} Varying β in MIFS
Generalized Divergence J • If the features are “biased” towards a class, J is large. • A good set of features should have small J.
Results:J with respect to β • First feature selected: GLDM ASM • Second feature selected: …
X1 minimize X2 maximize C XN {X1, X2, X3,…, X8} β = 1 β = 0 β = 0.5 S = {X2, X7} S = {X2, X4} S = {X2, X3} Conclusions • Mutual Info. Based Feature Selection (MIFS): • Generalized Divergence: