230 likes | 567 Views
Pattern Recognition Project. Contents. Data set Iris (Fisher’s data) Riply’s data set Hand-written numerals Classifier – (MATLAB CODE) Bayesian SVM K -nearest neighbor. Fisher’s Iris Plants Database.
E N D
Contents • Data set • Iris (Fisher’s data) • Riply’s data set • Hand-written numerals • Classifier – (MATLAB CODE) • Bayesian • SVM • K-nearest neighbor
Fisher’s Iris Plants Database • The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. • The sepal length, sepal width, petal length, and petal width are measured in centimeters on fifty iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. • Download the package from http://chien.csie.ncku.edu.tw/web/course/iris_svm.rar
Attribute Information: 1. sepal length 2. sepal width 3. petal length 4. petal width 5. class: -- Iris Setosa = 1 -- Iris Versicolour = 2 -- Iris Virginica = 3
Example: SVM (MATLAB Tool) %Load the sample data, which includes Fisher's iris data of 5 measurements on a sample of 150 irises. load fisheriris %Create data, a two-column matrix containing sepal length and sepal width measurements for 150 irises. data = [meas(:,1), meas(:,2)]; %From the species vector, create a new column vector, groups, to classify data into two groups: Setosa and non-Setosa. groups = ismember(species,'setosa');
%Randomly select training and test sets. [train, test] = crossvalind('holdOut',groups); cp = classperf(groups); %Train an SVM classifier using a linear kernel function and plot the grouped data. svmStruct = svmtrain(data(train,:),groups(train),'showplot',true); title(sprintf('Kernel Function:’%s',func2str(svmStruct.KernelFunction)),'interpreter','none');
%Use the svmclassify function to classify the test set. classes = svmclassify(svmStruct,data(test,:),'showplot',true); %Evaluate the performance of the classifier. classperf(cp,classes,test); cp.CorrectRate
Riply’s data set • The well-known Ripley dataset problem consists of two classes where the data for each class have been generated by a mixture of two Gaussian distributions. • This has two real-valued co-ordinates (xs and ys) and a class (yc) which is 0 or 1. • riply.tra: has 250 rows of the training set • riply.tes: has 1000 rows of the test set • Download the package from http://chien.csie.ncku.edu.tw/web/course/stprtool.rar
Evaluated on the testing Example: Bayesian classifier % load input training data trn = load('riply_trn'); inx1 = find(trn.y==1); inx2 = find(trn.y==2); % Estimation of class-conditional distributions by EM bayes_model.Pclass{1} = emgmm(trn.X(:,inx1),struct('ncomp',2)); bayes_model.Pclass{2} = emgmm(trn.X(:,inx2),struct('ncomp',2)); % Estimation of priors n1 = length(inx1); n2 = length(inx2); bayes_model.Prior = [n1 n2]/(n1+n2); % Evaluation on testing data tst = load('riply_tst'); ypred = bayescls(tst.X,bayes_model); cerror(ypred,tst.y)
Example: Binary SVM trn = load('riply_trn'); % load training data options.ker = 'rbf'; % use RBF kernel options.arg = 1; % kernel argument options.C = 10; % regularization constant % train SVM classifier model = smo(trn,options); % visualization figure; ppatterns(trn); psvm(model); tst = load('riply_tst'); % load testing data ypred = svmclass(tst.X,model); % classify data cerror(ypred,tst.y) % compute error
Example: K-nearest neighbor classifier • % load training data and setup 8-NN rule • trn = load('riply_trn'); • model = knnrule(trn,8); • % visualize decision boundary and training data • figure; • ppatterns(trn); • pboundary(model); • % evaluate classifier • tst = load('riply_tst'); • ypred = knnclass(tst.X,model); • cerror(ypred,tst.y)
Hand-written numerals • Pen-Based Recognition of Handwritten Digits. • Examples of numerals collected from 44 different persons. • The samples written by 30 writers are used for training, cross-validation and writer dependent testing. • The digits written by the other 14 are used for writer independent testing. • Each person drew 250 examples of each of numerals from ’0’ to ’9’.
Number of Instances • pendigits.txt 10992 • pendigits.tra Training 7494 • pendigits.tes Testing 3498 • Number of Attributes • 16 input+1 class attribute • For Each Attribute: • All input attributes are integers in the range 0..100 • The last attribute is the class code 0..9
Example 47,100, 27, 81, 57, 37, 26, 0, 0, 23, 56, 53,100, 90, 40, 98, 8 0, 89, 27,100, 42, 75, 29, 45, 15, 15, 37, 0, 69, 2,100, 6, 2 0, 57, 31, 68, 72, 90,100,100, 76, 75, 50, 51, 28, 25, 16, 0, 1 0,100, 7, 92, 5, 68, 19, 45, 86, 34,100, 45, 74, 23, 67, 0, 4 0, 67, 49, 83,100,100, 81, 80, 60, 60, 40, 40, 33, 20, 47, 0, 1 100,100, 88, 99, 49, 74, 17, 47, 0, 16, 37, 0, 73, 16, 20, 20, 6 0,100, 3, 72, 26, 35, 85, 35,100, 71, 73, 97, 65, 49, 66, 0, 4 0, 39, 2, 62, 11, 5, 63, 0,100, 43, 89, 99, 36,100, 0, 57, 0 13, 89, 12, 50, 72, 38, 56, 0, 4, 17, 0, 61, 32, 94,100,100, 5 57,100, 22, 72, 0, 31, 25, 0, 75, 13,100, 50, 75, 87, 26, 85, 0 74, 87, 31,100, 0, 69, 62, 64,100, 79,100, 38, 84, 0, 18, 1, 9 48, 96, 62, 65, 88, 27, 21, 0, 21, 33, 79, 67,100,100, 0, 85, 8 100,100, 72, 99, 36, 78, 34, 54, 79, 47, 64, 13, 19, 0, 0, 2, 5
Installation for MATLAB code • Install MATLAB in your machine. • Download the package from http://chien.csie.ncku.edu.tw/web/course/MATLABArsenal.rar • Unzip the .zip files into a arbitrary directory, say $MATLABArsenalRoot • Add the path $MATLABArsenalRoot and its subfolders in MATLAB. Use addpath command or menu File->Set Path.
How to use classifiers test_classify('classify -t input_file [general_option] [-- EvaluationMethod [evaluation_options]] ... [-- ClassifierWrapper [param] ] -- BaseClassifier [param] ); Example 1 test_classify('classify -t pendigits.txt -sf 1 -- LibSVM -Kernel 0 -CostFactor 3'); Prec:0.979803, Rec:0.979803, Err:0.020197 566 0 10 0 1 0 0 2 0 1 0 547 0 0 0 1 0 0 22 0 10 0 565 1 0 0 0 1 0 0 2 0 0 534 0 4 0 0 0 1 0 0 0 1 557 0 0 0 0 0 0 0 0 1 0 514 1 0 12 3 0 0 0 0 0 0 543 0 1 0 4 0 0 1 1 0 0 562 0 2 0 10 0 0 0 5 0 0 484 1 0 2 0 1 0 8 0 1 0 513 Classify pengigit.txt Shuffle the data before classfication ('-sf 1')50%-50% train-test split (default)Linear Kernel Support Vector Machine
Example 2 Classify pendigits.txt Training the model using pendigits.traLinear Kernel Support Vector Machine test_classify(strcat('classify -t pendigits.tra -- Train_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3')); Error = 0.009608 Classify pendigits.txt Testing the new data for pendigits.tes using pendigits.libSVM.model Linear Kernel Support Vector Machine test_classify(strcat('classify -t pendigits.tes -- Test_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3')); Error = 0.069754
Example 3 Classify pendigits.txt Do not shuffle the dataUse first 7494 data as training, the rest as testing Apply a multi-class classification wrapper RBF Kernel SVM_LIGHT Support Vector Machine test_classify('classify -t pendigits.txt -sf 0 -- train_test_validate -t 7494 -- train_test_multiple_class -- SVM_LIGHT -Kernel 2 -KernelParam 0.01 -CostFactor 3'); Error = 0.047170