130 likes | 309 Views
Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps. Kadim Tas¸demir and Erzsébet Merényi TNN, Vol.20, No. 4, 2009, pp. 549-562. Presenter : Wei- Shen Tai 200 9 / 5/21. Outline. Introduction Previous work on visualization of SOM knowledge
E N D
Exploiting Data Topology in Visualization andClustering of Self-Organizing Maps KadimTas¸demir and ErzsébetMerényi TNN, Vol.20, No. 4, 2009, pp. 549-562. Presenter : Wei-Shen Tai 2009/5/21
Outline • Introduction • Previous work on visualization of SOM knowledge • Topology visualization through connectivity matrix of SOM prototypes • Clustering through CONNVIS • Discussions and conclusion • Comments
Motivation • Exploit underutilized component of the SOM’s knowledge: data topology • Inclusion of data topology in the SOM visualization provides more sophisticated clues to cluster structure than existing SOM visualization approaches.
Objective • Integrate the data topology to the visualization of SOM • It can improve the cluster extraction of SOM map via “connectivity matrix” and its specific rendering over the SOM.
Visualization for SOM • SOM is a topology preserving mapping • Ideally, prototypes(neurons) those are neighbors in SOM map are also neighbors (centroids of neighboring Voronoipolyhedra) in data space and vice versa. • Growing SOM • It appears less robust than the Kohonen SOM because of the large number of parameters needing adjustment. • ViSOM • it requires a relatively large number of prototypes even for small data sets.
Topology visualization through connectivity matrix of SOM prototypes • Induced Delaunay triangulation • It can be determined from the relationships of the best matching units (BMUs) and the second BMUs. • CONN • It is a weighted analog of A, where the weights indicate the density distribution of the input data among the prototypes adjacent in M. • where, RFij means wi is the BMU and wjis the second BMU.
CONNvis: visualization of the connectivity matrix • Line width • The strength of the connection and reflects the density distribution among the connected units. • Line colors • A ranking of the connectivity strengths of wi. • Reveals most-to-least dense regions local to wi in data space.
Assessment of topology preservation with CONNvis • Topology violations • connected neural units that are not immediate neighbors in map (forward topology violations); • unconnected neural units that are immediate neighbors in map (backward topology violations).
Clustering through CONNVIS • Remove weak connections that link any two coarse clusters X and Y at their boundary • Step 1) Remove all weak connections to cluster X if the number of weak connections to X is less than the number of weak connections to the other cluster Y. • Step 2) Remove the weakest connection if the connections of the prototype to the two clusters have different widths. • Step 3) Remove the lowest ranking connection if the number of weak connections to both clusters is the same and all connections at the boundary of these clusters are weak. • Step 4) Repeat Steps 1)–3) until this prototype has been disconnected from one of the clusters. • Step 5) Repeat Steps 1)–4) for all prototypes at this boundary.
A Real-Data Application • A real remote sensing spectral image of Ocean City
Discussion and conclusions • CONNVIS • Integrates data distribution into the customary Delaunay triangulation. • Shows both forward and backward topology violations on the SOM grid. • Makes cluster extraction more efficiently.
Comments • Advantage • This proposed method improves the visualization of SOM via combining induced Delaunay triangulation with connection strength. • It adopts the training processed of conventional SOM, but renders the resulting map via those connections between neurons after removing weak connection and boundary neurons. • Drawback • In this paper, most of terminology are not as same as general used ones in SOM, such as data vectors. • If one connection, connects two neuron in the same cluster, cross over an unrelated neuron (because it is not a boundary neuron for this cluster, so it is not removed by this propose method), it will makes the user confuse in the relation of these three neurons. • Application • Data clustering.