120 likes | 129 Views
Data Mining CSCI 307, Spring 2019 Lecture 7. Output: Trees WEKA intro. Can Use Trees for Numeric Prediction Too. Regression : the process of computing an expression that predicts a numeric quantity Regression tree : “decision tree” where each leaf predicts a numeric quantity
E N D
Data MiningCSCI 307, Spring 2019Lecture 7 Output: Trees WEKA intro
Can Use Trees for Numeric Prediction Too • Regression: the process of computing an expression that predicts a numeric quantity • Regression tree: “decision tree” where each leaf predicts a numeric quantity • Predicted value is average value of training instances that reach the leaf • Model tree: “regression tree” with linear regression models at the leaf nodes • Linear patches approximate continuous function
Linear Regression for the CPU Data PRP = -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX
Model Tree for the CPU Data LM1 PRP = 8.29 + 0.004MMAX + 2.77CHMIN LM2 PRP = 20.3 + 0.004MMIN – 3.99CHMIN + 0.946CHMAX LM3 PRP = 38.1 + 0.012MMIN LM4 PRP = 19.5 + 0.002MMAX + 0.698CACH + 0.969CHMAX LM5 PRP = 285 – 1.46MYCT + 1.02CACH -9.39CHMIN LM6 PRP = -65.8 + 0.003MMIN – 2.94CHMIN + 4.98CHMAX
On Radius, do this once (make a WEKA folder, copy all the .arff files, copy the weka jar file) cd mkdirWEKAfiles cd WEKAfiles cp /usr/local/weka-3-8-1/data/* . cp /usr/local/weka-3-8-1/weka.jarweka.jar WEKA Waikato Environment for Knowledge Analysis To Run the WEKA application (cdWEKAfiles, if not there already) java –Xmx1000M -jar weka.jar To Download onto a Windows or Mac computer, visit: https://www.cs.waikato.ac.nz/ml/weka/
WEKA Introduction • A collection of open source of many data mining and machine learning algorithms, including • pre-processing on data • classification • clustering • association rule extraction • Created by researchers at the University of Waikato in New Zealand. • Java based (also open source).
WEKA Main Features • ∼ 49 data preprocessing tools • ∼ 76 classification/regression algorithms • ∼ 8 clustering algorithms • ∼15 attribute/subset evaluators + 10 search algorithms for feature selection • ∼ 3 algorithms for finding association rules • 3 graphical user interfaces • “The Explorer” (exploratory data analysis) • “The Experimenter” (experimental environment) • “The Knowledge Flow” (new process model inspired interface)
WEKA Application Interface • Explorer • preprocessing, attribute selection, learning, visualization • Experimenter • testing and evaluating machine learning algorithms • Knowledge Flow • visual design of the KDD (Knowledge Discovery /from Data/in Databases/with Data mining) process • Simple Command-line • A simple interface for typing commands
WEKA Functions and Tools • Preprocessing Filters • Attribute selection • Classification/Regression • Clustering • Association discovery • Visualization
WEKA: Pros andCons • Pros • Open source, • Free • Extensible • Can be integrated into other java packages • GUIs (Graphic User Interfaces) • Relatively easy to use • Features • Run individual experiment, or • Build KDD phases • Cons • Lack of proper and adequate documentations • Systems are updated constantly (Kitchen Sink Syndrome)