1 / 28

FLOSCAN: An Artificial Life Based Data Mining Algorithm

Introducing FLOSCAN, a density-based algorithm that can be used for clustering and as a pre-processing step in data mining. Experimental results and comparisons provided.

kiddk
Download Presentation

FLOSCAN: An Artificial Life Based Data Mining Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FLOSCAN: An Artificial Life Based Data Mining Algorithm A. Bellaachia Computer Science Department School of Engineering and Applied Sciences George Washington University Washington, DC 20052 E-mail: bell@gwu.edu A. Bellaachia

  2. Outline • Introduction • Artificial Life • Flocking Behavior • Flocking Parameters • FLOSCAN • Experimental Results • Conclusion & Future Work A. Bellaachia

  3. Artificial Life Behaviors A. Bellaachia

  4. Biologically inspired A. Bellaachia

  5. Flocking Behavior • Introduced by Reynolds in 1987 for computer graphics applications • Based on the flocking behavior of birds. • Each boid is defined by its direction, speed, the position of a set of birds (Boids) are related to the positions and velocities of its neighbors • Simple rules on individual boids yields a very interesting global behavior: • No leader, shape, global constraint. A. Bellaachia

  6. Flocking Behavior • Flocking basic rules: • Separation: steer to avoid crowding local flock-mates or collide with neighboring boids. • Alignment: steer towards the average heading of local flock-mates. • Cohesion: steer to move toward the average position of local flock-mates. A. Bellaachia

  7. Flocking Behavior (Reynold’s Model) • Boid Parameters: Velocity: refers to the combination of heading and speed. Minimum distance Angle: Boid’s vision  Position Maximum distance A. Bellaachia

  8. Flocking Model y Vj (Or average velocity of neighborhood boids) Vi VND VAL  Vj VAT New Position  x A. Bellaachia

  9. Attraction: Where is the maximum distance between two boids. A. Bellaachia

  10. FLOSCAN: Flocking Model • Boids Movement • Velocity = Current Position • VAL (align Vi) = average of velocity of ( Boid i and Velocity of neighboring boid(s) (Vj) ) • New Direction: VND = KND *(VAL+VAT) • No Separation y Vj  VND VAL= (Vi+Vj)/2 Vj Vi: Velocity = Current Position VAT New Position  x A. Bellaachia

  11. FLOSCAN: Flocking Model • Similar to the previous model except for VAL y • Velocity = Current Position • Ignore Vi • VAL (align Vi) = KAL*V • New Direction: VND = KND *(VAL+VAT) • No Separation Vj  VND VAL = KAL*Vj Vj VAT Vi: Velocity = Current Position New Position  x A. Bellaachia

  12. FLOSCAN: Attraction • Alignment: Where is the maximum distance between two boids. • Where K1 is constant • Note that Vj can also be the average velocity of the neighborhood boids. • Attraction: A. Bellaachia

  13. FLOSCAN: New Direction & Position • New Direction: • First Get the Speed vector: Where Where KND is constant • New Position: A. Bellaachia

  14. FLOSCAN: New Position • The new position vector of boid i is calculated as follows: A. Bellaachia

  15. FLOSCAN: Example A. Bellaachia

  16. FLOSCAN Objectives • FLSCAN: • It is a density-based algorithm • Objectives: • FLOSCAN as a pre-clustering step to a clustering algorithm: Data points, sharing some common features, will be closer to each other. This will enhance the shape of potential clusters and therefore improve the efficiency of a clustering algorithm. • FLOSCAN can also be used as either a clustering algorithm or a classification algorithm: capable of discovering clusters of different shapes and detecting noise points. A. Bellaachia

  17. FLOSCAN • Pseudo Code: • Initialization of the above parameters. • Calculate the distance between each document di • For each iteration do For each document di do Find the neighbors, N, of di using the minimum and maximum distances. • For each neighbor dj in N do Calculate the align vector and attract vector of di using dj Calculate the new direction vector for di • End do Calculate the new direction vector of di. End do End for. A. Bellaachia

  18. Experimental results • To measure the performance of FLOSCAN, we use the LDC TDT Corpus. • We have randomly chosen about 1,200 stories: • about half collected from Reuters newswire and • half from CNN broadcast news transcripts. • Twenty-five topics were defined in the original release. A. Bellaachia

  19. Document Representation • Use vector model to represent documents in the collection. • Remove stopwords • Assign a weight to each term in each document: Augmented weight L(tji) = 0.5 + 0.5 * (tf(tji)/tf(max)) where, tf(max) = max{ tf(t1i), tf(t2i), ... , tf(tmi)} and m is the max number of terms in the collection. A. Bellaachia

  20. Evaluations • F-Measure: • A combination of IR precision and recall. A. Bellaachia

  21. Evaluations • Centroid Similarity (CS): • It computes the similarity between the centroids of all clusters. Given a set of k clusters, CS is defined as follows: A. Bellaachia

  22. Experimental Results A. Bellaachia

  23. Experimental Results A. Bellaachia

  24. Experimental Results A. Bellaachia

  25. Experimental Results A. Bellaachia

  26. Experimental Results A. Bellaachia

  27. Conclusion & Future Work • FLOSCAN: Introduce a new flocking based algorithm that can be used: • Clustering • Pre-preprocessing step in a data mining algorithm. • Experimental results and comparison to DBSCAN. • Future Work include: • Other Experiments with large datasets and other artificial-life algorithms such as Ant algorithm. • Analyze the scalability of FLOSCAN • Use FLOSCAN as a classification algorithm • Theoretical analysis of the initial parameters required by FLOSCAN, namely maximum distance, number of iterations, speed value. A. Bellaachia

  28. Questions .. Thank you… A. Bellaachia

More Related