A novel clustering algorithm based on weighted support and its application

A novel clustering algorithm based on weighted support and its application Author : Xiang-Rong Yang Jun-Yi Shen Qlang Liu Graduate : Chien-Ming Hsiao

Outline • Motivation • Objective • Introduction • Description of some Terms • Algorithm and Analysis • Experimental results • Conclusions • Personal opinion

Motivation • Many efficient clustering algorithms have been proposed but most of these works focus on numerical data.

Objective • To present a novel and efficient algorithm WeiSC for clustering categorical data

Introduction • Clustering is an important KDD problem. • Objective : to group data into sets • Intra-cluster similarity is maximized • Inter-cluster similarity is minimized • Most of these works focus on numerical data whose inherent geometric properties can be exploited naturally to define distance functions between data points.

Introduction • The basic idea of WeiSC • It repeatedly read tuples from dataset one by one • When the first tuple arrives, it forms a cluster alone • The consequent tuples are either put into existing cluster or rejected by all existing clusters to form a new cluser by given similarity function defined between tuple and cluser. • Only makes one scan over the dataset

Description of some Terms

Description of some Terms • DEFINITION 1 • DEFINITION 2 • DEFINITION 3

Description of some Terms • DEFINITION 4 • DEFINITION 5

Algorithm and Analysis • Overview • Initially, the first tuple in the database is read and a cluster is constructed. • Then the consequent tuples are read iteratively. • The similarity between the new tuple and each existed clusters is computed according to • The similarity must be above the threshold, denoted as σ • When computing the similarity, we use the clusters’ summary instead of the clusters themselves, since the information needed contained in clusters’ summary

Computational complexities • The time and space complexities of the WeiSC algorithm depend on • The size of dataset (|D|) • The number of attributes (m) • The number of the clusters (p) , f (σ) • The size of each cluster, g(σ) • Time complexity O(|D| * m * f (σ)) • Space complexity O(|D| + m * f (σ) * g(σ))

Experimental results • The experimental results on the performance of WeiSC • Compare the clustering result with ROCK’s on the same data set

Quality of clustering results with real-life datasets • Mushroom dataset (real-life) • get from the UCI machine learning • Corresponding to 23 species of gilled mushrooms • Each species is identified as definitely edible, definitely poisonous • Has 21 attributes with 8124 tuples • The number of edible is 4208 • The number of poisonous is 3916

The effect of σ • The parameter of σ • Is the only parameter needed in WeiSC algorithm • Effects the results of clustering and the speed of algorithm • Can use the percentage of misclassified tuples as measure of the effect • Since the “edible” or “poisonous” has been labeled in each tuple

Conclusions • The WeiSC algorithm is robust and efficient • From inference and experimental • Read dataset only once • Used in IDS • Is speedy and deserves good efficiency

Personal Opinion • We can compare WeiSC algorithm with our algorithm.

A novel clustering algorithm based on weighted support and its application

A novel clustering algorithm based on weighted support and its application

Presentation Transcript

Mean-Shift Algorithm and Its Application

BIRCH: A New Data Clustering Algorithm and Its Applications

Literature Survey: Graph-based Clustering and its Application in Coreference Resolution

Neuronal Recording Based Clustering Algorithm

AdaBoost Algorithm and its Application on Object Detection

Weighted Clustering

Kernel-based Weighted Multi-view Clustering

A novel genetic algorithm for automatic clustering

A Novel Discourse Parser Based on Support Vector Machine Classification

SafeChoice : A Novel Clustering Algorithm for Wirelength -Driven Placement

TS Modeling Based on GMDH and Its application

Support Vector Clustering Algorithm

A novel ant-based clustering algorithm using the kernel method

Clustering Algorithm

A NOVEL LEVEL-BASED IPV6 ROUTING LOOKUP ALGORITHM

A Semi-supervised Document Clustering Algorithm based on EM

Face Recognition based on Radial Basis Function and Clustering Algorithm

A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging

Rek-means A k-means Based Clustering Algorithm

An algorithm for counting maximum weighted independent sets and its application.

A novel genetic algorithm for automatic clustering

A novel proxy key generation protocol and its application