Running Clustering Algorithm in Weka

Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

What is Weka? • Data mining software in Java • Supervised learning (classification) • Unsupervised learning (clustering) • Tools • Exploration • Visualization • Experiment • Statistical summary

Download Weka • http://www.cs.waikato.ac.nz/ml/weka/ • Window(weka-3-5-6jre.exe) • Linux

Getting Start

Memory Limitation in Weka • Run Chooser from DOS to increase memory • C:\> java -Xmx128m -classpath .;/progra~1/weka-3-5/weka.jar weka.gui.GUIChooser

Weka GUI

Explorer

Open Files (.csv, .arff)

Dataset’s Description Dataset’s statistics Attributes

Remove Class Attribute Non-class attributes

Select A Clustering Algorithm

Parameters’ Setting

Run A Clustering Algorithm

DBSCAN Results === Run information === Scheme: weka.clusterers.DBScan -E 0.9 -M 6 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R5 Instances: 150 Attributes: 4 sepallength sepalwidth petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 4 Epsilon: 0.9; minPoints: 6 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 1 Elapsed time: .06 ( 0.) 5.1,3.5,1.4,0.2 --> 0 ( 1.) 4.9,3,1.4,0.2 --> 0 ( 2.) 4.7,3.2,1.3,0.2 --> 0 ( 3.) 4.6,3.1,1.5,0.2 --> 0 ( 4.) 5,3.6,1.4,0.2 --> 0 … (146.) 6.3,2.5,5,1.9 --> 0 (147.) 6.5,3,5.2,2 --> 0 (148.) 6.2,3.4,5.4,2.3 --> 0 (149.) 5.9,3,5.1,1.8 --> 0 Clustered Instances 0 150 (100%)

Simplify A Tested Dataset

DBSCAN Clustering Results === Run information === Scheme: weka.clusterers.DBScan -E 0.3 -M 50 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 2 Epsilon: 0.3; minPoints: 50 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 2 Elapsed time: .03 ( 0.) 1.4,0.2 --> 0 ( 1.) 1.4,0.2 --> 0 ( 2.) 1.3,0.2 --> 0 ( 3.) 1.5,0.2 --> 0 … (146.) 5,1.9 --> 1 (147.) 5.2,2 --> 1 (148.) 5.4,2.3 --> 1 (149.) 5.1,1.8 --> 1 Clustered Instances 0 50 ( 33%) 1 100 ( 67%)

Run k-Means in Weka

k-Means Clustering Results === Run information === Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10 Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 5.179687509974782 Cluster centroids: Cluster 0 Mean/Mode: 4.906 1.676 Std Devs: 0.8256 0.4248 Cluster 1 Mean/Mode: 1.464 0.244 Std Devs: 0.1735 0.1072 Clustered Instances 0 100 ( 67%) 1 50 ( 33%)

ArffViewer: Convert Dataset’s Extension

Open A Dataset’s file

Select A Dataset’s File

View the Dataset

Manipulate the Dataset (Optional)

Save As .Arff File

Weka Documentation

Running Clustering Algorithm in Weka

Running Clustering Algorithm in Weka

Presentation Transcript

Local Clustering Algorithm

Linear Clustering Algorithm

Neuronal Recording Based Clustering Algorithm

K means Clustering ( Weka )

HCS Clustering Algorithm

Weka

Algorithm/Running Time Analysis

Weka

Support Vector Clustering Algorithm

Clustering Algorithm

DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM

Weka

Weka

Algorithm design for MAPS clustering

Boosting Algorithm for Clustering

Local Clustering Algorithm