170 likes | 328 Views
Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN. MAS 622J Course Project. Hyungil Ahn (hiahn@media.mit.edu). Objective & Dataset. Recognize the affective states of a child solving a puzzle Affective Dataset - 1024 features from Face, Posture, Game
E N D
Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN MAS 622J Course Project Hyungil Ahn (hiahn@media.mit.edu)
Objective & Dataset • Recognize the affective states of a child solving a puzzle • Affective Dataset - 1024 features from Face, Posture, Game - 3 affective states, labels annotated by teachers High interest (61), Low interest (59), Refreshing (16)
Task & Approaches • Binary Classification High interest (61 samples) vs. Low Interest or Refreshing (75 samples) • Approaches - Semi-Supervised Learning: Gaussian Process (GP) - Support Vector Machine - k-Nearest Neighbor (k = 1)
GP Semi-Supervised Learning • Given , predict the labels of unlabeled pts • Assume the data, data generation process X: inputs, y : vector of labels, t: vector of hidden soft labels, Each label (binary classification) Final classifier y = sign[ t ] = sign [ ] • Define Similarity function Infergiven
GP Semi-Supervised Learning Infergiven Bayesian Model : Prior of the classifier : Likelihood of the classifier given the labeled data
GP Semi-Supervised Learning How to model the prior & the likelihood ? The prior : Using GP, (Soft labels vary smoothly across the data manifold!) The likelihood :
GP Semi-Supervised Learning • EP (Expectation Propagation) approximating the posterior as a Gaussian • Select hyperparameter { kernel width σ, labeling error rate ε } that maximizes evidence ! • Advantage of using EP we get the evidence as a side product • EP estimates the leave-one-out predictive performance without performing any expensive cross-validation.
Support Vector Machine • OSU SVM toolbox • RBF kernel : • Hyperparameter {C, σ} Selection Use leave-one-out validation !
kNN (k = 1) • The label of test point follows that of its nearest point • This algorithm is simple to implement and the accuracy of this algorithm can be used as a base line. • However, sometimes this algorithm gives a good result !
Split of the dataset & Experiment • GP Semi-supervised learning - randomly select labeled data (p % of overall data), use the remaining data as unlabeled data, predict the labels of unlabeled data (In this setting, unlabeled data == test data) - 50 tries for each p (p = 10, 20, 30, 40, 50) - Each time select the hyperparameter that maximizes the evidence from EP • SVM and kNN - randomly select train data (p % of overall data), use the remaining data as test data, predict the labels of test data - 50 tries for each p (p = 10, 20, 30, 40, 50) - In the SVM, leave-one-out validation for hyperparameter selection was achieved by using the train data
GP – evidence & accuracy The case of Percentage of train points per class = 50 % (average over 10 tries) (Note) An offset was added to log evidence to plot all curves in the same figure. Max of Rec Accuracy ≈ Max of Log Evidence Find the optimal hyperparameter by using evidence from EP
Log (1/ ) SVM – hyperparameter selection Evidence from Leave-one-out validation Log (C) Select the hyperparameter {C, sigma} that maximizes the evidence from leave-one-out validation !
Classification Accuracy As expected, kNN is bad at small # of train pts and better at large # of train pts SVM has good accuracy even when the # of train pts is small, why? GP has bad accuracy when the # of train pts is large, why?
Analysis-SVM Why does SVM give a good test accuracy even when the number of train points is small ? • The best things I can tell… • {# Support Vectors} / {# of Train Points} is high in this task, in particular when the percentage of train points is low. • The support vectors decide the decision boundary. But it is not guaranteed that the SV ratio is highly related with the test accuracy. • Actually it is known that {Leave-one-out CV error} is less than {# Support Vectors} / {# of Train Points}. • 2. CV accuracy rate is high even when the # of train pts is small. CV accuracy rate is very related with Test accuracy rate.
Analysis-GP Why does GP give a bad test accuracy when the number of train points is small ? Percentage of train points per class = 50 % Max of Rec Accuracy ≈ Max of Log Evidence Percentage of train points per class = 10 % Log Evidence curve is flat fail to find optimal Sigma !
Conclusion • GP Small number of train points bad accuracy Large number of train points good accuracy • SVM Regardless of the number of train points good accuracy • kNN (k = 1) Small number of train points bad accuracy Large number of train points good accuracy