280 likes | 302 Views
Introducing FLOSCAN, a density-based algorithm that can be used for clustering and as a pre-processing step in data mining. Experimental results and comparisons provided.
E N D
FLOSCAN: An Artificial Life Based Data Mining Algorithm A. Bellaachia Computer Science Department School of Engineering and Applied Sciences George Washington University Washington, DC 20052 E-mail: bell@gwu.edu A. Bellaachia
Outline • Introduction • Artificial Life • Flocking Behavior • Flocking Parameters • FLOSCAN • Experimental Results • Conclusion & Future Work A. Bellaachia
Artificial Life Behaviors A. Bellaachia
Biologically inspired A. Bellaachia
Flocking Behavior • Introduced by Reynolds in 1987 for computer graphics applications • Based on the flocking behavior of birds. • Each boid is defined by its direction, speed, the position of a set of birds (Boids) are related to the positions and velocities of its neighbors • Simple rules on individual boids yields a very interesting global behavior: • No leader, shape, global constraint. A. Bellaachia
Flocking Behavior • Flocking basic rules: • Separation: steer to avoid crowding local flock-mates or collide with neighboring boids. • Alignment: steer towards the average heading of local flock-mates. • Cohesion: steer to move toward the average position of local flock-mates. A. Bellaachia
Flocking Behavior (Reynold’s Model) • Boid Parameters: Velocity: refers to the combination of heading and speed. Minimum distance Angle: Boid’s vision Position Maximum distance A. Bellaachia
Flocking Model y Vj (Or average velocity of neighborhood boids) Vi VND VAL Vj VAT New Position x A. Bellaachia
Attraction: Where is the maximum distance between two boids. A. Bellaachia
FLOSCAN: Flocking Model • Boids Movement • Velocity = Current Position • VAL (align Vi) = average of velocity of ( Boid i and Velocity of neighboring boid(s) (Vj) ) • New Direction: VND = KND *(VAL+VAT) • No Separation y Vj VND VAL= (Vi+Vj)/2 Vj Vi: Velocity = Current Position VAT New Position x A. Bellaachia
FLOSCAN: Flocking Model • Similar to the previous model except for VAL y • Velocity = Current Position • Ignore Vi • VAL (align Vi) = KAL*V • New Direction: VND = KND *(VAL+VAT) • No Separation Vj VND VAL = KAL*Vj Vj VAT Vi: Velocity = Current Position New Position x A. Bellaachia
FLOSCAN: Attraction • Alignment: Where is the maximum distance between two boids. • Where K1 is constant • Note that Vj can also be the average velocity of the neighborhood boids. • Attraction: A. Bellaachia
FLOSCAN: New Direction & Position • New Direction: • First Get the Speed vector: Where Where KND is constant • New Position: A. Bellaachia
FLOSCAN: New Position • The new position vector of boid i is calculated as follows: A. Bellaachia
FLOSCAN: Example A. Bellaachia
FLOSCAN Objectives • FLSCAN: • It is a density-based algorithm • Objectives: • FLOSCAN as a pre-clustering step to a clustering algorithm: Data points, sharing some common features, will be closer to each other. This will enhance the shape of potential clusters and therefore improve the efficiency of a clustering algorithm. • FLOSCAN can also be used as either a clustering algorithm or a classification algorithm: capable of discovering clusters of different shapes and detecting noise points. A. Bellaachia
FLOSCAN • Pseudo Code: • Initialization of the above parameters. • Calculate the distance between each document di • For each iteration do For each document di do Find the neighbors, N, of di using the minimum and maximum distances. • For each neighbor dj in N do Calculate the align vector and attract vector of di using dj Calculate the new direction vector for di • End do Calculate the new direction vector of di. End do End for. A. Bellaachia
Experimental results • To measure the performance of FLOSCAN, we use the LDC TDT Corpus. • We have randomly chosen about 1,200 stories: • about half collected from Reuters newswire and • half from CNN broadcast news transcripts. • Twenty-five topics were defined in the original release. A. Bellaachia
Document Representation • Use vector model to represent documents in the collection. • Remove stopwords • Assign a weight to each term in each document: Augmented weight L(tji) = 0.5 + 0.5 * (tf(tji)/tf(max)) where, tf(max) = max{ tf(t1i), tf(t2i), ... , tf(tmi)} and m is the max number of terms in the collection. A. Bellaachia
Evaluations • F-Measure: • A combination of IR precision and recall. A. Bellaachia
Evaluations • Centroid Similarity (CS): • It computes the similarity between the centroids of all clusters. Given a set of k clusters, CS is defined as follows: A. Bellaachia
Experimental Results A. Bellaachia
Experimental Results A. Bellaachia
Experimental Results A. Bellaachia
Experimental Results A. Bellaachia
Experimental Results A. Bellaachia
Conclusion & Future Work • FLOSCAN: Introduce a new flocking based algorithm that can be used: • Clustering • Pre-preprocessing step in a data mining algorithm. • Experimental results and comparison to DBSCAN. • Future Work include: • Other Experiments with large datasets and other artificial-life algorithms such as Ant algorithm. • Analyze the scalability of FLOSCAN • Use FLOSCAN as a classification algorithm • Theoretical analysis of the initial parameters required by FLOSCAN, namely maximum distance, number of iterations, speed value. A. Bellaachia
Questions .. Thank you… A. Bellaachia