Adaptive Cluster Ensemble Selection

Adaptive Cluster Ensemble Selection Javad Azimi, Xiaoli Fern {azimi, xfern}@eecs.oregonstate.edu Oregon State University Presenter: Javad Azimi.

Cluster Ensembles Data Set Setting up different clustering methods. Clustering 1 Clustering 2 …………. Clustering n Generating different results. Result 1 Result 2 ……………. Result n Consensus Function Combine to obtain final results. Final Clusters

Cluster Ensembles:Challenge • One can easily generate hundreds or thousands of clustering results. • Is it good to always include all clustering results in the ensemble? • We may want to be selective. • Which subset is the best?

What makes a good ensemble? • Diversity • Members should be different from each other • Measured by Normalized Mutual Information (NMI) • Select a subset of ensemble members based on diversity: • Hadjitodorov et al. 2005: Ensemble with median diversity usually works better. • Fern and Lin 2008: Cluster ensemble members into distinct groups and then choose one from each group.

Diversity in Cluster Ensembles:Drawback • They aim to design selection heuristics without considering the characteristics of the data sets and ensembles. • Our goal: selecting adaptively based on the behavior of the data set and ensemble itself.

Our Approach • We empirically examined the behavior of the ensembles and the clustering performance on 4 different data sets. • Use the four training sets to learn an adaptive strategy • We evaluated the learned strategy on test data sets. • 4 training data sets: Iris, Soybean, Wine, Thyroid.

An Empirical Investigation • Generate a large ensemble • 100 independent runs of two different algorithms (K-means and MSF) • Analyze the diversity of the generated ensemble • Generate a final result P* based on all ensemble members • Compute the NMI between ensemble members and P* • Examine the distribution of the diversity • Consider different potential subsets selected based on diversity and evaluate their clustering performance

Observation #1 • There are two distinct types of ensembles • Stable: most ensemble members are similar to P* • Unstable: most ensemble members are different from P*. stable unstable # of ensembles NMI with P*

Consider DifferentSubsets • Compute the NMI between each member and P* • Sort NMI values • Consider 4 different subsets Low diversity (L) High diversity (H) Members sorted based on NMI values Medium diversity (M)

Observation #2 • Different subsets work the best for stable and unstable data: • Stable: subsets F and L worked well • Unstable: subset H worked well

Our final strategy • Generate a large ensemble П (200 solutions) • Obtain the consensus partition P* • Compute NMI between ensemble members and P* and sort them in decreasing order. • If average NMI > 0.5, classify ensemble as stable and output P* as the final partition • Otherwise, classify ensemble as non-stable and select the H (high diversity) subset, and output its consensus clustering.

Experimental Setup • 100 independent runs of k-means and MSF are used to generate the ensemble members. • Consensus function: average link HAC on the co-association matrix

Experimental Results:Data Set Classification

Experimental Results:Results on Different Subsets

Experimental Results:Proposed Method versus Fern-Lin

Experimental Results:Selecting a Method vs Selecting the Best Ensemble Members • Which members are selected for final clustering? MSF K-means NMI with P* Only MSF members are selected MSF and K-means member are selected

Experimental Results:How accurate are the selected ensemble members? • x-axis: members in decreasing order of NMI values with P* • y-axis: their correspond NMI values with ground truth labels Selected ensemble members More accurate ensemble members are selected Most similar to P* Most dissimilar to P*

Conclusion • We empirically learned a simple ensemble selection strategy: • First classify an given ensemble as stable or unstable. • Then select a subset according to the classification result. • On separate test data sets, we achieve excellent results: • Some times significantly better than best ensemble member. • Outperforms an existing selection method.

Adaptive Cluster Ensemble Selection

Adaptive Cluster Ensemble Selection

Presentation Transcript

Adaptive Multiple Relay Selection Scheme for Cooperative Wireless Networks

A Novel Adaptive Distributed Load Balancing Strategy for Cluster

Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach

High Resolution Satellite Precipitation Estimate Using Cluster Ensemble Cloud Classification

Ensemble Emulation

Ensemble Forecasting

Ensemble Forecasting

Ensemble Learning

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Ensemble-based adaptive sampling and data assimilation issues in tropical cyclones

Unnatural selection: adaptive evolution driven by chemical pollution

The Adaptive Information Cluster First Annual Conference

Unsupervised Feature Selection for Multi-Cluster Data

Adaptive Information Cluster

Ensemble

Jazz Ensemble

Genesys Ensemble

Document Clustering with Cluster Refinement and Model Selection Capabilities

Low Energy Adaptive Clustering Hierarchy with Deterministic Cluster-Head Selection

Low Energy Adaptive Clustering Hierarchy with Deterministic Cluster-Head Selection

Sequential Genetic Search for Ensemble Feature Selection