280 likes | 459 Views
A genetic clustering algorithm for data with non-spherical-shape clusters. Outline. Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review. Motivation.
E N D
A genetic clustering algorithm for data with non-spherical-shape clusters
Outline • Motivation • Objective • Introduction • The basic concept of genetic strategy • The genetic clustering algorithm • Experiments • Concluding remarks and Summary • Personal opinions • Review
Motivation • Some problems of the clustering. • The number of clusters? • The threshold distance d in neighborhood clustering. • Non-spherical-shape clusters.
Objective • To solve the problem of these traditional clustering algorithm. • A genetic clustering algorithm for clustering. • Non-spherical-shape clusters. • According to the similarities and automatically find the proper k.
Introduction • These clustering methods can broadly be classified into two categories: • Hierarchical • agglomerative • divisive • Non-hierarchical • k-means
Introduction • The problems in most of these clustering algorithms • The number of clusters? • Non-spherical shape cluster? • The threshold of distance for merge? • GA clustering algorithm • Searching, as same as clustering.
Encoding schemas Fitnessevaluation YES Testing the end of the algorithm Halt NO Parent selection Crossover operators Mutation operators Basic concept of Classical Genetic Algorithm
First stage Nearest Neighbor Second stage GA clustering C1, C2, …, Cm n objects, O1, O2, …, On merge The genetic clustering algorithm • The algorithm CLUSTERING consists of two stages
First Stage Step 1: find the nearest neighbor of each object Oi. Step 2: dav, the average of the nearest neighbor distances. The mean of u ?
First Stage Step 3: compute the adjacency matrix Anxn. Step 4: connected components be denoted by C1, C2, …, Cm.
Encoding schemas Fitnessevaluation YES Testing the end of the algorithm Halt NO Parent selection Crossover operators Mutation operators Second Stage • The initialization step • Population • Coding • Dinter and Dintra • The three phases of GA • Reproduction phase • Crossover phase • Mutation phase
Second Stage • Distance matrix Dmxm of each pair of cluster Ci and Cj.
m 1 1 1 0 0 1 0 1 1 0 R1 R2 Second Stage • The initialization step • Population: 50 strings. • The length of each string is m: {C1, C2, …, Cm} • For each string Ri, two sets Ui and U’i are defined U1={C1, C2, C3} ; U’1={C4, C5} U2={C1, C3, C4} ; U’2={C2, C5}
Second Stage • Intra-distance Dintra and the inter-distance Dinter U1={C1, C2, C3} ; U’1={C4, C5, C7}
Second Stage • Reproduction phase • Fitness function SCORE(Ri) = Dinter(Ri)*w – Dintra(Ri),w within [1,3]. • Reproducted probability • Crossover phase • pc = 0.8. • Mutation phase • pm= 0.1.
Merge_Sets_Finding Algorithm Step 1: Sort the fitness of the strings. Step 2: Choose Ri. Step 3: Choose smallest l > i such that . IF no such l exists THEN go to Step 4(discarded) ELSE i = l and go to Step 2(merge) Step 4: End. R1={C1, C2, C3} R2={C3, C4, C6} R3={C4, C5}
Experiments - 1 Noise : distance > 2dav Original
Experiments - 1 7 clusters u=1.2, 8 clusters
Experiments - 1 u=1.5 or 2, 5 clusters 6 clusters
Experiments - 1 u=1.2, w=2, 4 clusters (best) 3 clusters
Experiments - 1 4 clusters (direct GA) 2 clusters
Experiments - 1 4 clusters (k-mean)
Experiments - 2 Original 4 clusters 2 clusters 3 clusters
Experiments - 3 Original 4 clusters
Concluding and Summary • A genetic clustering algorithm CLUSTERING • Non-spherical shape. • Automatic clustering. • Binary searching the proper interval for w.
Personal Opinions • The proper number of cluster decide by the value of w.
Review • Using GCA to automatic clustering. • Split : NN. • Merge : Merge_Sets_Finding Algorithm.