1 / 13

A Short Introduction to Weka

Natural Language Processing Thursday, September 25th. A Short Introduction to Weka. What is weka?. Java-based Machine Learning Tool Implements numerous classifiers 3 modes of operation GUI Command Line Java API (not discussed here) Google: ‘weka java’. weka Homepage.

sonya-foley
Download Presentation

A Short Introduction to Weka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing Thursday, September 25th A Short Introduction to Weka

  2. What is weka? • Java-based Machine Learning Tool • Implements numerous classifiers • 3 modes of operation • GUI • Command Line • Java API (not discussed here) • Google: ‘weka java’

  3. weka Homepage • http://www.cs.waikato.ac.nz/ml/weka/ • To run: • java -Xmx1024M -jar ~cs4705/bin/weka.jar &

  4. .arff file format • http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor, Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …

  5. .arff file format @attribute attrName {numeric, string, <nominal>, date} • numeric: a number • nominal: a (finite) set of strings, e.g.{Iris-setosa,Iris-versicolor, Iris-virginica} • string: <arbitrary strings> • date: (default ISO-8601) yyyy-MM-dd’T’HH:mm:ss

  6. Example Arff Files • ~cs4705/bin/weka-3-4-11/data/ • iris.arff • soybean.arff • weather.arff

  7. Click 'Start' Wait... Right-click on Result list entry 'Save result buffer' 'Save model' To Classify with weka GUI • Run weka GUI • Click 'Explorer' • 'Open file...' • Select 'Classify' tab • 'Choose' a classifier • Confirm options

  8. Classify • Some classifiers to start with. • NaiveBayes • JRip • J48 • SMO • Find References by selecting a classifier • Use Cross-Validation!

  9. Analyzing Results • Important tools for Homework 2 • Accuracy • “Correctly classified instances” • F-measure • Confusion matrix • Save model • Visualization

  10. Running weka from the Command Line • Running an N-fold cross validation experiment • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i • Using a predefined test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff

  11. Saving the model • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model • Classifying a test set • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff • Getting help • java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?

  12. Homework 2 Weka Workflow … T1 TN YourFeature Extractor S1 S2 YourFeature Extractor best model Weka .arff Test .arff … SN results results Weka Preprocessing (you) Experimentation (you) Grading (us)

  13. Tips for Homework Success • Start early • Read instructions carefully • Start simply • Your system should always work • 80/20 Rule • Add features incrementally • This way, you always have something you can turn in.

More Related