240 likes | 593 Views
WEKA 3.5.5. (sumber: Machine Learning with WEKA). What is WEKA?. Weka is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
E N D
WEKA 3.5.5 (sumber: Machine Learning with WEKA)
What is WEKA? • Weka is a collection of machine learning algorithms for data mining tasks. • Weka contains tools for • data pre-processing, • classification, • regression, • clustering, • association rules, and • visualization. • It is also well-suited for developing new machine learning schemes.
Dataset • A dataset is roughly equivalent to a two-dimensional spreadsheet or database table. • A dataset is a collection of examples. • The external representation of an Instances class is an ARFF file, which consists of a header describing the attribute types and the data as comma-separated list.
Dataset - ARFF • The ARFF Header Section The ARFF Header section of the file contains the relation declaration and attribute declarations. • The @relation Declaration The relation name is defined as the first line. • The @attribute Declarations Each attribute in the data set has its own @attribute statement which uniquely defines the name and it's data type. The order the attributes are declared indicates the column position in the data section of the file.
ARFF - Data Types • The <datatype> can be any of the types: • Numeric: can be real or integer numbers. • integer is treated as numeric • real is treated as numeric • Nominal • String • Date • The keywords numeric, real, integer, string and date are case insensitive.
ARFF - Data Types Example • @ATTRIBUTE sepallength NUMERIC • @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} • @ATTRIBUTE LCC string • @attribute <name> date [<date-format>] default format: yyyy-MM-dd'T'HH:mm:ss
ARFF - Data Section .. • The ARFF Data section of the file contains the data declaration line and the actual instance lines. • The @data Declaration The @data declaration is a single line denoting the start of the data segment in the file. • The instance data • Each instance on a single line • Attribute values delimited by commas • The order agreed the declaration in header section • Missing values are represented by a single question mark • Values of string and nominal attributes are case sensitive, and any that contain space must be quoted
Program • LogWindow Opens a log window that captures all that is printed to stdout or stderr. Useful for environments like MS Windows, where WEKA is not started from a terminal. • Exit Closes WEKA.
Applications • Explorer: for exploring data with WEKA. • Experimenter: for performing experiments and conducting statistical tests between learning schemes. • KnowledgeFlow: supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental learning. • SimpleCLI: Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line interface.
Tools • ArffViewer An MDI application for viewing ARFF files in spreadsheet format. • SqlViewer represents an SQL worksheet, for querying databases via JDBC. • EnsembleLibrary An interface for generating setups for Ensemble Selection.
Visualization • Plot For plotting a 2D plot of a dataset. • ROC Displays a previously saved ROC curve. • TreeVisualizer For displaying directed graphs, e.g., a decision tree. • GraphVisualizer Visualizes XML BIF or DOT format graphs, e.g., for Bayesian networks. • BoundaryVisualizer Allows the visualization of classifier decision boundaries in two dimensions.
Windows • Minimize Minimizes all current windows. • Restore Restores all minimized windows again.