120 likes | 279 Views
CS4705 – Natural Language Processing Thursday, September 28. Introduction to Weka. What is weka?. java-based Machine Learning Tool 3 modes of operation GUI Command Line API (not discussed here) To run: java -Xmx1024M -jar ~cs4705/bin/weka.jar &. weka Homepage.
E N D
CS4705 – Natural Language Processing Thursday, September 28 Introduction to Weka
What is weka? • java-based Machine Learning Tool • 3 modes of operation • GUI • Command Line • API (not discussed here) • To run: • java -Xmx1024M -jar ~cs4705/bin/weka.jar &
weka Homepage • http://www.cs.waikato.ac.nz/ml/weka/
.arff file format • http://www.cs.waikato.ac.nz/~ml/weka/arff.html @relation name @attribute attrName {numeric, string, <nominal>, date} ... @data a,b,c,d,e • <nominal> := {class1,class2,...,classN}
Example Arff Files • http://sourceforge.net/projects/weka • iris.arff • cmc.arff
Click 'Start' Wait... Right-click on Result list entry 'Save result buffer' 'Save model' To Classify with weka GUI • Run weka GUI • Click 'Explorer' • 'Open file...' • Select 'Classify' tab • 'Choose' a classifier • Confirm options
Classify • Some classifiers to start with. • NaiveBayes • JRip • J48 • SMO • Find References by selecting a classifier • Use Cross-Validation!
Analyzing Results • Important tools for Homework 2 • Accuracy • “Correctly classified instances” • Confusion matrix • Save model • Visualization
Running weka from the Command Line • Running an N-fold cross validation experiment • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N • Using a predefined test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff
Saving the model • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model • Classifying a test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff
Analyzing results • Get predictions from test data • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff -p range • Then DIY with scripts • awk and sed will be your friends
Getting predictions from crossvalidation • “Output Predictions” doesn't cut it. • export CLASSPATH=~cs4705/bin/:~cs4705/bin/weka.jar • java callClassifier weka.classifiers.bayes.NaiveBayes -t trainingdata.arff