330 likes | 811 Views
Running Clustering Algorithm in Weka. Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston. What is Weka?. Data mining software in Java Supervised learning (classification) Unsupervised learning (clustering) Tools Exploration Visualization Experiment
E N D
Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston
What is Weka? • Data mining software in Java • Supervised learning (classification) • Unsupervised learning (clustering) • Tools • Exploration • Visualization • Experiment • Statistical summary
Download Weka • http://www.cs.waikato.ac.nz/ml/weka/ • Window(weka-3-5-6jre.exe) • Linux
Memory Limitation in Weka • Run Chooser from DOS to increase memory • C:\> java -Xmx128m -classpath .;/progra~1/weka-3-5/weka.jar weka.gui.GUIChooser
Dataset’s Description Dataset’s statistics Attributes
Remove Class Attribute Non-class attributes
DBSCAN Results === Run information === Scheme: weka.clusterers.DBScan -E 0.9 -M 6 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R5 Instances: 150 Attributes: 4 sepallength sepalwidth petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 4 Epsilon: 0.9; minPoints: 6 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 1 Elapsed time: .06 ( 0.) 5.1,3.5,1.4,0.2 --> 0 ( 1.) 4.9,3,1.4,0.2 --> 0 ( 2.) 4.7,3.2,1.3,0.2 --> 0 ( 3.) 4.6,3.1,1.5,0.2 --> 0 ( 4.) 5,3.6,1.4,0.2 --> 0 … (146.) 6.3,2.5,5,1.9 --> 0 (147.) 6.5,3,5.2,2 --> 0 (148.) 6.2,3.4,5.4,2.3 --> 0 (149.) 5.9,3,5.1,1.8 --> 0 Clustered Instances 0 150 (100%)
DBSCAN Clustering Results === Run information === Scheme: weka.clusterers.DBScan -E 0.3 -M 50 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 2 Epsilon: 0.3; minPoints: 50 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 2 Elapsed time: .03 ( 0.) 1.4,0.2 --> 0 ( 1.) 1.4,0.2 --> 0 ( 2.) 1.3,0.2 --> 0 ( 3.) 1.5,0.2 --> 0 … (146.) 5,1.9 --> 1 (147.) 5.2,2 --> 1 (148.) 5.4,2.3 --> 1 (149.) 5.1,1.8 --> 1 Clustered Instances 0 50 ( 33%) 1 100 ( 67%)
k-Means Clustering Results === Run information === Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10 Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 5.179687509974782 Cluster centroids: Cluster 0 Mean/Mode: 4.906 1.676 Std Devs: 0.8256 0.4248 Cluster 1 Mean/Mode: 1.464 0.244 Std Devs: 0.1735 0.1072 Clustered Instances 0 100 ( 67%) 1 50 ( 33%)