220 likes | 297 Views
A Combinatorial Fusion Method for Feature Mining. Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham University Presented by Gary Weiss. Introduction. Feature construction/engineering often a critical step in the data mining process
E N D
A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham University Presented by Gary Weiss
Introduction • Feature construction/engineering often a critical step in the data mining process • Can be very time-consuming and may require a lot of manual effort • Our approach is to use a combinatorial method to automatically construct new features • We refer to this as “feature fusion” • Geared toward helping to predict rare classes • For now it is restricted to numerical features, but can be extended to other features
How does this relate to MMIS? • One MMIS category is local pattern analysis • How to efficiently identify quality knowledge from a single data source • Listed data preparation and selection as subtopics and also mentioned fusion • We acknowledge that this work probably is not what most people think of as MMIS
How can we view this work as MMIS? • Think of each feature as piece of information • Our fusion approach integrates these pieces • Fusion itself is a proper topic for MMIS since it can also be used with multiple info sources • The fusion method we employ does not really care if the information (i.e., features) are from a single source • As complexity of constructed features increases, each can be viewed as a classifier • Each fused feature is an information source • This view is bolstered by other work on data fusion that using ensembles to combine each fused feature
Description of the Method • A data set is a collection of records where each feature has a score • We assume numerical features • We then replace scores by ranks • Ordering of ranks determined by whether larger or small scores better predict class • Compute performance of each feature • Compute performance of feature combinations • Decide which combinations to evaluate/use
Step 3: Compute Feature Performance • Performance measures how well feature predicts minority class • We sort rows by feature rank and measure performance on top n%, where n% belong to minority class • In this case we evaluate on top 3 rows. Since 2 of 3 are minority (class=1), performance = .66
Let F6 be fused F1F2F3F4F5 Rank combination function is average of ranks Compute rank of F6 for each record Compute performance of F6 as in step 3 Step 4: Compute Performance of Feature Combinations
Step 5: What Combinations to Use? • Given n features there are 2n – 1 possible combinations • C(n,1) + C(n,2) … C(n.n) • This “fully exhaustive” fusion strategy is practical for many values of n • We try other strategies in case not feasible • k-exhaustive strategy selects k best features and tries all combinations • k-fusion strategy uses all n features but fuses at most k features at once
Combinatorial Fusion Algorithm • Combinatorial strategy generates features • Performance metric determines which are best • Used to determine which k features for k-fusion • Also used to determine order of features to add • We add a feature if it leads to a statistically significant improvement (p ≤ .10) • As measured on validation data • This limits the number of features • But requires a lot of computation
Description of Experiments • We use Weka’s DT, 1-NN, and Naïve Bayes methods • Analyze performance on 10 data sets • With and without fused features • Focus on AUC as the main metric • More appropriate than accuracy especially with skewed data • Use 3 combinatorial fusion strategies • 2-fusion, 3-fusion, and 6-exhaustive
Results Summary Results over all 10 Data Sets Results over 4 Most Skewed Data Sets (< 10% Minority)
Discussion of Results • No one of the 3 fusion schemes is clearly best • The methods seem to help, but the biggest improvement is clearly with the DT method • May be explained by traditional DT methods having limited expressive power • They can only consider 1 feature at a time • Can never perfectly learn simple concepts like F1+F2 > 10, but can with feature-fusion • Bigger improvement for highly skewed data sets • Identifying rare cases is difficult and may require looking at many features in parallel
Future Work • More comprehensive experiments • More data sets, more skewed data sets, more combinatorial fusion strategies • Use of heuristics to more intelligently choose fused features • Performance measure now used only to order • Use of diversity measures • Avoid building classifier to determine which fused features to add • Handle non-numerical features
Conclusion • Showed how a method from information fusion can be applied to feature construction • Results encouraging but more study needed • Extending the method should lead to further improvements