120 likes | 238 Views
CS 8520: Artificial Intelligence. Weka Lab Paula Matuszek Spring, 2013. CSC 8520 Spring 2013. Paula Matuszek. Weka is. Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20 years
E N D
CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Spring, 2013 CSC 8520 Spring 2013. Paula Matuszek
Weka is • Waikato Environment for Knowledge Analysis • Machine Learning Software Suite from the University of Waikato • Been under development for 20 years • Well-developed, maintained, supported • Open source • Windows, Mac and Unix versions • http://www.cs.waikato.ac.nz/ml/weka/index.html • Lots of help available at the wiki: • http://weka.wikispaces.com/ CSC 8520 Spring 2013. Paula Matuszek
ROC Curve • {Receiver|Relative} Operating Characteristic Curve • Name derives from signal detection theory • Basically plots sensitivity on the Y axis against specificity on the X-axis (actually 1-specificity) • Ideal would be (0,1). Random would be (0.5, 0.5) (in a balanced domain) • Useful for • evaluating a classifier • comparing classifiers • setting cutoffs for class membership CSC 8520 Spring 2013. Paula Matuszek
http://en.wikipedia.org/wiki/File:ROC_space-2.png CSC 8520 Spring 2013. Paula Matuszek
More Weka • Last week -- cross-validated decision tree. • Go through section 4.2 of the tutorial. • What data set did you use? • Which classifier did better based on the confusion matrix? • What about the ROC curve? CSC 8520 Spring 2013. Paula Matuszek
Trying a Support Vector Classifier • SMO is a support vector classifier • http://weka.sourceforge.net/doc/weka/classifiers/functions/SMO.html • libSVM is a faster SVM, but it is not installed with Weka; all that is there is a wrapper. CSC 8520 Spring 2013. Paula Matuszek
Decision Tree vs SMO • Repeat section 4.2, replacing the RandomForest classifier with SMO • What were the results for your data source? CSC 8520 Spring 2013. Paula Matuszek
Moving on to the Weka Explorer • Explore some of the data sets included with Weka. • Restart Weka, using the Explorer instead of the KnowledgeFlow. • Make sure the Proprocess step is highlighted • Use the Open File Option to look at some of the data sets • Choose one which is binary • usually there is a feature just labeled class • And looks interesting. CSC 8520 Spring 2013. Paula Matuszek
Exploring with Weka • Going to go through a different tutorial which uses the Explorer interface • The tutorial is at http://www.ibm.com/developerworks/opensource/library/os-weka2/index.html • It uses data which can be downloaded at the Download section about 2/3 of the way down the page. CSC 8520 Spring 2013. Paula Matuszek
Decision Tree Again • The first part of the tutorial creates a decision tree using J48, as in the Knowledge Flow Tutorial. • This should give exactly the same results as the KnowledgeFlow approach; it’s just a different interface. • Which did you find easier? • Try it on the data set you chose earlier. How well did it do? CSC 8520 Spring 2013. Paula Matuszek
Clustering • The second part of the tutorial uses a simpleKMeans cluster algorithm. • Try it on the sample data they provide. • Do the results for their data make sense? • Set the number of clusters to 2 and try it on the data set you chose. • Do the results make sense? • Do the two clusters match the two classes in your data? • Try it again removing the “class” feature. Do you still get reasonable results? CSC 8520 Spring 2013. Paula Matuszek
Explore! • Go ahead and try some of the other capabilities in Weka. CSC 8520 Spring 2013. Paula Matuszek