1 / 10

Weka Just do it

Weka Just do it. Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand. Overview. Classifiers, Regressors, and clusterers Multiple evaluation schemes Bagging and Boosting Feature Selection: right features and data key to successful learning

gbrucker
Download Presentation

Weka Just do it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WekaJust do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand

  2. Overview • Classifiers, Regressors, and clusterers • Multiple evaluation schemes • Bagging and Boosting • Feature Selection: • right features and data key to successful learning • Experimenter • Visualizer • Text not up to date. • They welcome additions.

  3. Learning Tasks • Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples. • Regression: given examples labelled with a real value, generate procedure for labelling unseen examples. • Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.

  4. Data Format: IRIS @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa Etc. General from @atttribute attribute-name REAL or list of values

  5. J48 = Decision Tree petalwidth <= 0.6: Iris-setosa (50.0) : # under node petalwidth > 0.6 # ..number wrong | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

  6. Cross-validation • Correctly Classified Instances 143 95.3% • Incorrectly Classified Instances 7 4.67 % • Default 10-fold cross validation i.e. • Split data into 10 equal sized pieces • Train on 9 pieces and test on remainder • Do for all possibilities and average

  7. J48 Confusion Matrix Old data set from statistics: 50 of each class a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 3 47 | c = Iris-virginica

  8. Precision, Recall, and Accuracy • Precision: probability of being correct given that your decision. • Precision of iris-setosa is 49/49 = 100% • Specificity in medical literature • Recall: probability of correctly identifying class. • Recall accuracy for iris-setosa is 49/50 = 98% • Sensitity in medical literature • Accuracy: # right/total = 143/150 =~95%

  9. Other Evaluation Schemes • Leave-one-out cross-validation • Cross-validation where n = number of training instanced • Specific train and test set • Allows for exact replication • Ok if train/test large, e.g. 10,000 range.

  10. Bootstrap sampling • Randomly select n with replacement from n • Expect about 2/3 to be chosen for training • Prob of not chosen = (1-1/n)^n ~ 1/e. • Testing on remainder • Repeat about 30 times and average. • Avoids partition bias

More Related