140 likes | 183 Views
Natural Language Processing Thursday, September 25th. A Short Introduction to Weka. What is weka?. Java-based Machine Learning Tool Implements numerous classifiers 3 modes of operation GUI Command Line Java API (not discussed here) Google: ‘weka java’. weka Homepage.
E N D
Natural Language Processing Thursday, September 25th A Short Introduction to Weka
What is weka? • Java-based Machine Learning Tool • Implements numerous classifiers • 3 modes of operation • GUI • Command Line • Java API (not discussed here) • Google: ‘weka java’
weka Homepage • http://www.cs.waikato.ac.nz/ml/weka/ • To run: • java -Xmx1024M -jar ~cs4705/bin/weka.jar &
.arff file format • http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor, Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …
.arff file format @attribute attrName {numeric, string, <nominal>, date} • numeric: a number • nominal: a (finite) set of strings, e.g.{Iris-setosa,Iris-versicolor, Iris-virginica} • string: <arbitrary strings> • date: (default ISO-8601) yyyy-MM-dd’T’HH:mm:ss
Example Arff Files • ~cs4705/bin/weka-3-4-11/data/ • iris.arff • soybean.arff • weather.arff
Click 'Start' Wait... Right-click on Result list entry 'Save result buffer' 'Save model' To Classify with weka GUI • Run weka GUI • Click 'Explorer' • 'Open file...' • Select 'Classify' tab • 'Choose' a classifier • Confirm options
Classify • Some classifiers to start with. • NaiveBayes • JRip • J48 • SMO • Find References by selecting a classifier • Use Cross-Validation!
Analyzing Results • Important tools for Homework 2 • Accuracy • “Correctly classified instances” • F-measure • Confusion matrix • Save model • Visualization
Running weka from the Command Line • Running an N-fold cross validation experiment • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i • Using a predefined test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff
Saving the model • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model • Classifying a test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff • Getting help • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?
Homework 2 Weka Workflow … T1 TN YourFeature Extractor S1 S2 YourFeature Extractor best model Weka .arff Test .arff … SN results results Weka Preprocessing (you) Experimentation (you) Grading (us)
Tips for Homework Success • Start early • Read instructions carefully • Start simply • Your system should always work • 80/20 Rule • Add features incrementally • This way, you always have something you can turn in.