170 likes | 307 Views
R for Classification. Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2 nd May 2013. Classification?. Object Type Feature1 Feature2 Feature3 ……. Feature n Label 1 val[1,1] val[1,2] val[1,3] ……. val[1,n]
E N D
R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2nd May 2013
Classification? Object Type Feature1 Feature2 Feature3 ……. Feature n Label 1 val[1,1] val[1,2] val[1,3] ……. val[1,n] Label 2 val[2,1] val[2,2] val[2,3] ……. val[2,n] …… ……. ……. ……. ……. ……… Label m val[m,1] val[m.2] val[m,3] ……. val[m,n] Automatic Identification of Type (Class) of Object from Measured Variables (Features) 2 of 17
Example Data 3 of 17
Data Preparation & Investigation EDA Technique Box Plots PCA Decision Trees Clustering • Best features to distinguish between classes • Relationships between • features • Feature reduction Training Set 4 of 17
Box Plots PCA & Multivariate Analysis: ade4 FactoMineR 5 of 17
Example Classifier 6 of 17
Classification Algorithms in R Rattle: RAnalytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55) 7 of 17
SVM 8 of 17
Ensemble Algorithm 9 of 17
Training and Testing Classification Results Trained Classifier Training Set (labelled) Classification Algorithm: Neural Network Support Vector Machine Random Forest Test Set (unlabelled) Assess Predictions: Confusion Matrix ROC Curve (2 categories) …. Prediction Results + Labels 10 of 17
Using Classifiers in R Select Training Data Build Classifier classifier algorithm(formula, data, options) (boosting and nnet) Run Classifier classifier.pred predict(classifier, newdata, options) 11 of 17
SVM & Neural Net Tuning 12 of 17
Classifier Feedback print(classifier) plot(classifier) high Gini Coefficient = high dispersion 13 of 17
Classifier Prediction Results predict(type = “class”) predict(type = “prob”) confusion matrix 14 of 17
Binary Classification Results Class Present? N Y False Positive True Positive Y Class Detected? False Negative True Negative N 15 of 17
ROC Curves in R ROCR package 16 of 17
Example Results 17 of 17