1 / 30

Running Clustering Algorithm in Weka

Running Clustering Algorithm in Weka. Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston. What is Weka?. Data mining software in Java Supervised learning (classification) Unsupervised learning (clustering) Tools Exploration Visualization Experiment

holland
Download Presentation

Running Clustering Algorithm in Weka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston

  2. What is Weka? • Data mining software in Java • Supervised learning (classification) • Unsupervised learning (clustering) • Tools • Exploration • Visualization • Experiment • Statistical summary

  3. Download Weka • http://www.cs.waikato.ac.nz/ml/weka/ • Window(weka-3-5-6jre.exe) • Linux

  4. Getting Start

  5. Memory Limitation in Weka • Run Chooser from DOS to increase memory • C:\> java -Xmx128m -classpath .;/progra~1/weka-3-5/weka.jar weka.gui.GUIChooser

  6. Weka GUI

  7. Explorer

  8. Open Files (.csv, .arff)

  9. Dataset’s Description Dataset’s statistics Attributes

  10. Remove Class Attribute Non-class attributes

  11. Select A Clustering Algorithm

  12. Select A Clustering Algorithm

  13. Select A Clustering Algorithm

  14. Parameters’ Setting

  15. Run A Clustering Algorithm

  16. DBSCAN Results === Run information === Scheme: weka.clusterers.DBScan -E 0.9 -M 6 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R5 Instances: 150 Attributes: 4 sepallength sepalwidth petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 4 Epsilon: 0.9; minPoints: 6 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 1 Elapsed time: .06 ( 0.) 5.1,3.5,1.4,0.2 --> 0 ( 1.) 4.9,3,1.4,0.2 --> 0 ( 2.) 4.7,3.2,1.3,0.2 --> 0 ( 3.) 4.6,3.1,1.5,0.2 --> 0 ( 4.) 5,3.6,1.4,0.2 --> 0 … (146.) 6.3,2.5,5,1.9 --> 0 (147.) 6.5,3,5.2,2 --> 0 (148.) 6.2,3.4,5.4,2.3 --> 0 (149.) 5.9,3,5.1,1.8 --> 0 Clustered Instances 0 150 (100%)

  17. Simplify A Tested Dataset

  18. Simplify A Tested Dataset

  19. Parameters’ Setting

  20. DBSCAN Clustering Results === Run information === Scheme: weka.clusterers.DBScan -E 0.3 -M 50 -I weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase -D weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === DBScan clustering results ======================================================================================== Clustered DataObjects: 150 Number of attributes: 2 Epsilon: 0.3; minPoints: 50 Index: weka.clusterers.forOPTICSAndDBScan.Databases.SequentialDatabase Distance-type: weka.clusterers.forOPTICSAndDBScan.DataObjects.EuclidianDataObject Number of generated clusters: 2 Elapsed time: .03 ( 0.) 1.4,0.2 --> 0 ( 1.) 1.4,0.2 --> 0 ( 2.) 1.3,0.2 --> 0 ( 3.) 1.5,0.2 --> 0 … (146.) 5,1.9 --> 1 (147.) 5.2,2 --> 1 (148.) 5.4,2.3 --> 1 (149.) 5.1,1.8 --> 1 Clustered Instances 0 50 ( 33%) 1 100 ( 67%)

  21. Run k-Means in Weka

  22. Parameters’ Setting

  23. k-Means Clustering Results === Run information === Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10 Relation: iris-weka.filters.unsupervised.attribute.Remove-R1-2,5 Instances: 150 Attributes: 2 petallength petalwidth Test mode: evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 5.179687509974782 Cluster centroids: Cluster 0 Mean/Mode: 4.906 1.676 Std Devs: 0.8256 0.4248 Cluster 1 Mean/Mode: 1.464 0.244 Std Devs: 0.1735 0.1072 Clustered Instances 0 100 ( 67%) 1 50 ( 33%)

  24. ArffViewer: Convert Dataset’s Extension

  25. Open A Dataset’s file

  26. Select A Dataset’s File

  27. View the Dataset

  28. Manipulate the Dataset (Optional)

  29. Save As .Arff File

  30. Weka Documentation

More Related