120 likes | 135 Views
Explore WEKA, an open-source software with diverse algorithms for data mining and machine learning tasks. Learn about its features, interface, functions, tools, pros, and cons. Get started with WEKA to enhance your data analysis skills.
E N D
Data MiningCSCI 307, Spring 2019Lecture 7 Output: Trees WEKA intro
Can Use Trees for Numeric Prediction Too • Regression: the process of computing an expression that predicts a numeric quantity • Regression tree: “decision tree” where each leaf predicts a numeric quantity • Predicted value is average value of training instances that reach the leaf • Model tree: “regression tree” with linear regression models at the leaf nodes • Linear patches approximate continuous function
Linear Regression for the CPU Data PRP = -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX
Model Tree for the CPU Data LM1 PRP = 8.29 + 0.004MMAX + 2.77CHMIN LM2 PRP = 20.3 + 0.004MMIN – 3.99CHMIN + 0.946CHMAX LM3 PRP = 38.1 + 0.012MMIN LM4 PRP = 19.5 + 0.002MMAX + 0.698CACH + 0.969CHMAX LM5 PRP = 285 – 1.46MYCT + 1.02CACH -9.39CHMIN LM6 PRP = -65.8 + 0.003MMIN – 2.94CHMIN + 4.98CHMAX
On Radius, do this once (make a WEKA folder, copy all the .arff files, copy the weka jar file) cd mkdirWEKAfiles cd WEKAfiles cp /usr/local/weka-3-8-1/data/* . cp /usr/local/weka-3-8-1/weka.jarweka.jar WEKA Waikato Environment for Knowledge Analysis To Run the WEKA application (cdWEKAfiles, if not there already) java –Xmx1000M -jar weka.jar To Download onto a Windows or Mac computer, visit: https://www.cs.waikato.ac.nz/ml/weka/
WEKA Introduction • A collection of open source of many data mining and machine learning algorithms, including • pre-processing on data • classification • clustering • association rule extraction • Created by researchers at the University of Waikato in New Zealand. • Java based (also open source).
WEKA Main Features • ∼ 49 data preprocessing tools • ∼ 76 classification/regression algorithms • ∼ 8 clustering algorithms • ∼15 attribute/subset evaluators + 10 search algorithms for feature selection • ∼ 3 algorithms for finding association rules • 3 graphical user interfaces • “The Explorer” (exploratory data analysis) • “The Experimenter” (experimental environment) • “The Knowledge Flow” (new process model inspired interface)
WEKA Application Interface • Explorer • preprocessing, attribute selection, learning, visualization • Experimenter • testing and evaluating machine learning algorithms • Knowledge Flow • visual design of the KDD (Knowledge Discovery /from Data/in Databases/with Data mining) process • Simple Command-line • A simple interface for typing commands
WEKA Functions and Tools • Preprocessing Filters • Attribute selection • Classification/Regression • Clustering • Association discovery • Visualization
WEKA: Pros andCons • Pros • Open source, • Free • Extensible • Can be integrated into other java packages • GUIs (Graphic User Interfaces) • Relatively easy to use • Features • Run individual experiment, or • Build KDD phases • Cons • Lack of proper and adequate documentations • Systems are updated constantly (Kitchen Sink Syndrome)