Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN

Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN MAS 622J Course Project Hyungil Ahn (hiahn@media.mit.edu)

Objective & Dataset • Recognize the affective states of a child solving a puzzle • Affective Dataset - 1024 features from Face, Posture, Game - 3 affective states, labels annotated by teachers High interest (61), Low interest (59), Refreshing (16)

Task & Approaches • Binary Classification High interest (61 samples) vs. Low Interest or Refreshing (75 samples) • Approaches - Semi-Supervised Learning: Gaussian Process (GP) - Support Vector Machine - k-Nearest Neighbor (k = 1)

GP Semi-Supervised Learning • Given , predict the labels of unlabeled pts • Assume the data, data generation process X: inputs, y : vector of labels, t: vector of hidden soft labels, Each label (binary classification) Final classifier y = sign[ t ] = sign [ ] • Define Similarity function  Infergiven

GP Semi-Supervised Learning Infergiven  Bayesian Model  : Prior of the classifier : Likelihood of the classifier given the labeled data

GP Semi-Supervised Learning  How to model the prior & the likelihood ? The prior : Using GP, (Soft labels vary smoothly across the data manifold!) The likelihood :

GP Semi-Supervised Learning • EP (Expectation Propagation)  approximating the posterior as a Gaussian • Select hyperparameter { kernel width σ, labeling error rate ε } that maximizes evidence ! • Advantage of using EP  we get the evidence as a side product • EP estimates the leave-one-out predictive performance without performing any expensive cross-validation.

Support Vector Machine • OSU SVM toolbox • RBF kernel : • Hyperparameter {C, σ} Selection  Use leave-one-out validation !

kNN (k = 1) • The label of test point follows that of its nearest point • This algorithm is simple to implement and the accuracy of this algorithm can be used as a base line. • However, sometimes this algorithm gives a good result !

Split of the dataset & Experiment • GP Semi-supervised learning - randomly select labeled data (p % of overall data), use the remaining data as unlabeled data, predict the labels of unlabeled data (In this setting, unlabeled data == test data) - 50 tries for each p (p = 10, 20, 30, 40, 50) - Each time select the hyperparameter that maximizes the evidence from EP • SVM and kNN - randomly select train data (p % of overall data), use the remaining data as test data, predict the labels of test data - 50 tries for each p (p = 10, 20, 30, 40, 50) - In the SVM, leave-one-out validation for hyperparameter selection was achieved by using the train data

GP – evidence & accuracy The case of Percentage of train points per class = 50 % (average over 10 tries) (Note) An offset was added to log evidence to plot all curves in the same figure. Max of Rec Accuracy ≈ Max of Log Evidence  Find the optimal hyperparameter by using evidence from EP

Log (1/ ) SVM – hyperparameter selection Evidence from Leave-one-out validation Log (C) Select the hyperparameter {C, sigma} that maximizes the evidence from leave-one-out validation !

Classification Accuracy As expected, kNN is bad at small # of train pts and better at large # of train pts SVM has good accuracy even when the # of train pts is small, why? GP has bad accuracy when the # of train pts is large, why?

Analysis-SVM Why does SVM give a good test accuracy even when the number of train points is small ? • The best things I can tell… • {# Support Vectors} / {# of Train Points} is high in this task, in particular when the percentage of train points is low. • The support vectors decide the decision boundary. But it is not guaranteed that the SV ratio is highly related with the test accuracy. • Actually it is known that {Leave-one-out CV error} is less than {# Support Vectors} / {# of Train Points}. • 2. CV accuracy rate is high even when the # of train pts is small. CV accuracy rate is very related with Test accuracy rate.

Analysis-GP Why does GP give a bad test accuracy when the number of train points is small ? Percentage of train points per class = 50 % Max of Rec Accuracy ≈ Max of Log Evidence Percentage of train points per class = 10 % Log Evidence curve is flat  fail to find optimal Sigma !

Conclusion • GP Small number of train points  bad accuracy Large number of train points  good accuracy • SVM Regardless of the number of train points  good accuracy • kNN (k = 1) Small number of train points  bad accuracy Large number of train points  good accuracy

Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN

Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN

Presentation Transcript

Semi-supervised Learning

Semi-Supervised Learning

Semi-Supervised Learning

Semi-Supervised and Active Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised Learning

Supervised and semi-supervised learning for NLP

Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised Learning

Classification and Supervised Learning

Semi-Supervised Learning

Semi-Supervised Time Series Classification

kNN and SVM

Semi-Supervised Learning

EEG Classification using Semi Supervised Learning