1 / 15

Weka Tutorial

Weka Tutorial. WEKA:: Introduction. A collection of open source ML algorithms pre-processing classifiers clustering association rule Created by researchers at the University of Waikato in New Zealand Java based. WEKA:: Installation.

tom
Download Presentation

Weka Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weka Tutorial

  2. WEKA:: Introduction • A collection of open source ML algorithms • pre-processing • classifiers • clustering • association rule • Created by researchers at the University of Waikato in New Zealand • Java based

  3. WEKA:: Installation • Download software from http://www.cs.waikato.ac.nz/ml/weka/ • If you are interested in modifying/extending weka there is a developer version that includes the source code • Set the weka environment variable for java • setenv WEKAHOME /usr/local/weka/weka-3-6-1 • setenv CLASSPATH $WEKAHOME/weka.jar:$CLASSPATH • Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html

  4. WEKA:: Introduction .contd • Routines are implemented as classes and logically arranged in packages • Comes with an extensive GUI interface • Weka routines can be used stand alone via the command line • Eg. java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff

  5. WEKA:: Interface

  6. WEKA:: Data format • Uses flat text files to describe the data • Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats • Data can be imported from a file in various formats: • ARFF, CSV, C4.5, binary • Data can also be read from a URL or from an SQL database (using JDBC)

  7. numeric attribute nominal attribute WEKA:: ARRF file format @relation anneal @attribute carbon @attribute hardness @attribute 'enamelability' {'?','1','2','3','4','5'} @attribute cholesterol numeric @attribute shape { COIL, SHEET} @attribute class {‘1’,’2’,’3’,’4’,’5’,’U’} @data • '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','M','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',2.801,385.1,0,'?','0','?','3' • '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','B','Y','?','?','?','Y','?','?','?','?','?','?','?','?','?','SHEET',0.801,255,269,'?','0','?','3' • '?','C','A',0,45,'?','S','?',0,'?','?','D','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',1.6,610,0,'?','0','?','3' ... A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html

  8. WEKA:: Explorer: Preprocessing • Pre-processing tools in WEKA are called “filters” • WEKA contains filters for: • Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc

  9. Annealing dataset : Description • Annealing dataset is from the UCI repository of datasets. It contains information about data being annealed and its various properties. • There are 38 attributes in this dataset in which 6 are continuous, 3 are integer valued and remaining 29 are nominal. • This dataset consists of missing values and in total has 798 records along with 6 major classes. • The notion of classes will be explained later during classification.

  10. Data Cleaning: Removing missing values:

  11. Data Cleaning: Removing useless attributes Earlier 38 now 32

  12. Data transformation: Discretizing the attributes Implies 15 bins First-last means all attributes

  13. Data reduction: Supervised attribute selection Reducing data size from 32 to 10

  14. Viewing and understanding the transformed data • This can be done using the ARFF viewer option in Weka. • It allows us to save files in other formats also like CSV and others. arfftocsv convertor option and vice versa is also there. • Such files can then be imported into mysql databases and others easily after this conversion.

  15. Data is now ready for data mining!

More Related