FLOSCAN: An Artificial Life Based Data Mining Algorithm

FLOSCAN: An Artificial Life Based Data Mining Algorithm A. Bellaachia Computer Science Department School of Engineering and Applied Sciences George Washington University Washington, DC 20052 E-mail: bell@gwu.edu A. Bellaachia

Outline • Introduction • Artificial Life • Flocking Behavior • Flocking Parameters • FLOSCAN • Experimental Results • Conclusion & Future Work A. Bellaachia

Artificial Life Behaviors A. Bellaachia

Biologically inspired A. Bellaachia

Flocking Behavior • Introduced by Reynolds in 1987 for computer graphics applications • Based on the flocking behavior of birds. • Each boid is defined by its direction, speed, the position of a set of birds (Boids) are related to the positions and velocities of its neighbors • Simple rules on individual boids yields a very interesting global behavior: • No leader, shape, global constraint. A. Bellaachia

Flocking Behavior • Flocking basic rules: • Separation: steer to avoid crowding local flock-mates or collide with neighboring boids. • Alignment: steer towards the average heading of local flock-mates. • Cohesion: steer to move toward the average position of local flock-mates. A. Bellaachia

Flocking Behavior (Reynold’s Model) • Boid Parameters: Velocity: refers to the combination of heading and speed. Minimum distance Angle: Boid’s vision  Position Maximum distance A. Bellaachia

Flocking Model y Vj (Or average velocity of neighborhood boids) Vi VND VAL  Vj VAT New Position  x A. Bellaachia

Attraction: Where is the maximum distance between two boids. A. Bellaachia

FLOSCAN: Flocking Model • Boids Movement • Velocity = Current Position • VAL (align Vi) = average of velocity of ( Boid i and Velocity of neighboring boid(s) (Vj) ) • New Direction: VND = KND *(VAL+VAT) • No Separation y Vj  VND VAL= (Vi+Vj)/2 Vj Vi: Velocity = Current Position VAT New Position  x A. Bellaachia

FLOSCAN: Flocking Model • Similar to the previous model except for VAL y • Velocity = Current Position • Ignore Vi • VAL (align Vi) = KAL*V • New Direction: VND = KND *(VAL+VAT) • No Separation Vj  VND VAL = KAL*Vj Vj VAT Vi: Velocity = Current Position New Position  x A. Bellaachia

FLOSCAN: Attraction • Alignment: Where is the maximum distance between two boids. • Where K1 is constant • Note that Vj can also be the average velocity of the neighborhood boids. • Attraction: A. Bellaachia

FLOSCAN: New Direction & Position • New Direction: • First Get the Speed vector: Where Where KND is constant • New Position: A. Bellaachia

FLOSCAN: New Position • The new position vector of boid i is calculated as follows: A. Bellaachia

FLOSCAN: Example A. Bellaachia

FLOSCAN Objectives • FLSCAN: • It is a density-based algorithm • Objectives: • FLOSCAN as a pre-clustering step to a clustering algorithm: Data points, sharing some common features, will be closer to each other. This will enhance the shape of potential clusters and therefore improve the efficiency of a clustering algorithm. • FLOSCAN can also be used as either a clustering algorithm or a classification algorithm: capable of discovering clusters of different shapes and detecting noise points. A. Bellaachia

FLOSCAN • Pseudo Code: • Initialization of the above parameters. • Calculate the distance between each document di • For each iteration do For each document di do Find the neighbors, N, of di using the minimum and maximum distances. • For each neighbor dj in N do Calculate the align vector and attract vector of di using dj Calculate the new direction vector for di • End do Calculate the new direction vector of di. End do End for. A. Bellaachia

Experimental results • To measure the performance of FLOSCAN, we use the LDC TDT Corpus. • We have randomly chosen about 1,200 stories: • about half collected from Reuters newswire and • half from CNN broadcast news transcripts. • Twenty-five topics were defined in the original release. A. Bellaachia

Document Representation • Use vector model to represent documents in the collection. • Remove stopwords • Assign a weight to each term in each document: Augmented weight L(tji) = 0.5 + 0.5 * (tf(tji)/tf(max)) where, tf(max) = max{ tf(t1i), tf(t2i), ... , tf(tmi)} and m is the max number of terms in the collection. A. Bellaachia

Evaluations • F-Measure: • A combination of IR precision and recall. A. Bellaachia

Evaluations • Centroid Similarity (CS): • It computes the similarity between the centroids of all clusters. Given a set of k clusters, CS is defined as follows: A. Bellaachia

Experimental Results A. Bellaachia

Conclusion & Future Work • FLOSCAN: Introduce a new flocking based algorithm that can be used: • Clustering • Pre-preprocessing step in a data mining algorithm. • Experimental results and comparison to DBSCAN. • Future Work include: • Other Experiments with large datasets and other artificial-life algorithms such as Ant algorithm. • Analyze the scalability of FLOSCAN • Use FLOSCAN as a classification algorithm • Theoretical analysis of the initial parameters required by FLOSCAN, namely maximum distance, number of iterations, speed value. A. Bellaachia

Questions .. Thank you… A. Bellaachia

FLOSCAN: An Artificial Life Based Data Mining Algorithm

FLOSCAN: An Artificial Life Based Data Mining Algorithm

Presentation Transcript

Data Mining

Decision Tree Approach in Data Mining

CSC 550: Introduction to Artificial Intelligence Fall 2008

Data Mining

Data Mining

Data Mining: An Introduction

DATA MINING

Data Mining and Medical Informatics

CHAPTER 17: DATA MINING BASICS

CHAPTER 17: DATA MINING BASICS

Data Mining-Knowledge Presentation—ID3 algorithm

CPS 196.03: Information Management and Mining

Mining Decision Trees from Data Streams

Chapter 4: Data Mining Primitives, Languages, and System Architectures

Data Mining with DB

The Q-matrix method: A new artificial intelligence tool for data mining

Spatial and Temporal Data Mining

Lecture 5 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö

Web Usage Mining