170 likes | 494 Views
Weka Tutorial. WEKA:: Introduction. A collection of open source ML algorithms pre-processing classifiers clustering association rule Created by researchers at the University of Waikato in New Zealand Java based. WEKA:: Installation.
E N D
WEKA:: Introduction • A collection of open source ML algorithms • pre-processing • classifiers • clustering • association rule • Created by researchers at the University of Waikato in New Zealand • Java based
WEKA:: Installation • Download software from http://www.cs.waikato.ac.nz/ml/weka/ • If you are interested in modifying/extending weka there is a developer version that includes the source code • Set the weka environment variable for java • setenv WEKAHOME /usr/local/weka/weka-3-6-1 • setenv CLASSPATH $WEKAHOME/weka.jar:$CLASSPATH • Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html
WEKA:: Introduction .contd • Routines are implemented as classes and logically arranged in packages • Comes with an extensive GUI interface • Weka routines can be used stand alone via the command line • Eg. java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff
WEKA:: Data format • Uses flat text files to describe the data • Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats • Data can be imported from a file in various formats: • ARFF, CSV, C4.5, binary • Data can also be read from a URL or from an SQL database (using JDBC)
numeric attribute nominal attribute WEKA:: ARRF file format @relation anneal @attribute carbon @attribute hardness @attribute 'enamelability' {'?','1','2','3','4','5'} @attribute cholesterol numeric @attribute shape { COIL, SHEET} @attribute class {‘1’,’2’,’3’,’4’,’5’,’U’} @data • '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','M','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',2.801,385.1,0,'?','0','?','3' • '?','C','A',0,60,'T','?','?',0,'?','?','G','?','?','?','?','B','Y','?','?','?','Y','?','?','?','?','?','?','?','?','?','SHEET',0.801,255,269,'?','0','?','3' • '?','C','A',0,45,'?','S','?',0,'?','?','D','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','COIL',1.6,610,0,'?','0','?','3' ... A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html
WEKA:: Explorer: Preprocessing • Pre-processing tools in WEKA are called “filters” • WEKA contains filters for: • Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc
Annealing dataset : Description • Annealing dataset is from the UCI repository of datasets. It contains information about data being annealed and its various properties. • There are 38 attributes in this dataset in which 6 are continuous, 3 are integer valued and remaining 29 are nominal. • This dataset consists of missing values and in total has 798 records along with 6 major classes. • The notion of classes will be explained later during classification.
Data Cleaning: Removing useless attributes Earlier 38 now 32
Data transformation: Discretizing the attributes Implies 15 bins First-last means all attributes
Data reduction: Supervised attribute selection Reducing data size from 32 to 10
Viewing and understanding the transformed data • This can be done using the ARFF viewer option in Weka. • It allows us to save files in other formats also like CSV and others. arfftocsv convertor option and vice versa is also there. • Such files can then be imported into mysql databases and others easily after this conversion.