W E K A

W E K A Waikato Environment for Knowledge Aquisition

Goals of the workshop evaluate & interpret the results identifying a problem Write seminar work apply to data choose appropriate DM technique transform into data • Aquisition of functional knowledge about the WEKA platform • Ability of processing (own) data in WEKA

What is WEKA ? Some basic facts about WEKA: • WEKA(1) = a flightless bird with an inquisitive nature (found only on the islands of New Zealand) • WEKA(2) = a software ‘workbench’ incorporating several standard ML/DM techniques • Authors = Ian H. Witten, Eibe Frank (et. al.) • Programminglanguage = JAVA • Origin = The Universityof Waikato, NewZealand • Literature = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999 • Homepage = http://www.cs.waikato.ac.nz/~ml/weka

Objectives of WEKA • make ML/DM techniques generally available • apply them to practical problems (in agriculture) • develop new ML/DM algorithms • contribute to the theoretical frameworkof the field (ML/DM)

Versions of WEKA • There are several versions of WEKA: • WEKA 3.0: “book version” compatible with description in data mining book • WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) • WEKA 3.4: “development version” with lots of improvements • This workshop is based on WEKA 3.4(.3)

The input to WEKA ARFF format (“flat” files): • example: Play-tennis domain %this is an example of a knowledge %domain in ARFF format @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes . . . Conversion to the ARFF format ? • Example: • converting from • MS-EXCEL to ARFF

Starting WEKA – the GUI

A quick tour of the “explorer” • Preprocess panel Filters panel Domain info. panel Attribute info. panel Attributes panel Attribute visualization panel Status bar Log file

A quick tour of the “explorer” • Classify panel Output panel Classifier panel Test options panel Class attribute Result panel

A quick tour of the “explorer” • Visualize panel

The command line C:\Temp>java weka.classifiers.trees.J48 Weka exception: No training file and no object input file given. General options: -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -s <random number seed> Sets random number seed for cross-validation (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -l <name of input file> Sets model input file. -d <name of output file> Sets model output file. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p Only outputs predictions for test instances. -r Only outputs cumulative margin distribution. -z <class name> Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. Options specific to weka.classifiers.j48.J48: -U Use unpruned tree. -C <pruning confidence> Set confidence threshold for pruning. (default 0.25) -M <minimum number of instances> Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built. • example:

GUI vs. command line • Command line (-): • only textual visualisation of models • awkward to use • Command line (+): • full functionality • (‘saving the model’) • batch processing GUI (+): • visualisation of data and (some) models GUI (-): • not all the parameters can be set (reduced functionality)

PROs & CONs of WEKA PROs: • open source (GNU licence) • platform-independent (JAVA) • easy to use • (relatively) easy to modify • CONs: • relatively slow (JAVA) • ‘incomplete’documentation • (some GUI features could • be explained better) • some features available • only from command line

Let’s go to work

W E K A

W E K A

Presentation Transcript

w w w . s d c . m e . u k

T e a m w o r k

T-E-A-M-W-O-R-K

H o w t o m a k e a c a k e

w w w . s d c . m e . u k

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . o a k p a r k e a g l e s l a x . c o m

W e e k 1

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . k k i n v e s t . r u

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . o a k p a r k e a g l e s l a x . c o m

w w w . s d c . m e . u k