Active learning

Active learning • The learning algorithm must have some control over the data from which it learns • It must be able to query an oracle, requesting for labels of data samples that seem to be most informative for the learning process • Proper selection of samples implies better performances with fewer data

Scenarios • Learning with membership queries • Stream-based sampling • Pool-based sampling

Strategies • Uncertainty sampling • Query-by-committee • Density-weighted…

Conformal prediction • Permits complementation of predictions made by machine learning algorithms with some measures of reliability • The label predicted for a new object must make it similar to the old objects • The degree of similarity is used to estimate the confidence in the prediction

Conformal prediction algorithm • Inputs: • Training sample and a test sample • Consider all possible values for the label ; • Compute nonconformity scores and p-values for each possible classification; • Predict the label corresponding to the largest p-value calculated; • Output one minus the second largest p-value as the confidence for the prediction; • Output the largest p-value calculated as the credibility of the prediction.

Nonconformity scores and p-values • Used as nonconformity scores the Lagrange multipliers computed during SVM training • Extended to a multiclass framework in a one-vs-rest approach • P-values:

Active learning algorithm • Inputs • Initial training set T, calibration set C, pool of candidate samples U • Selection tresholdτ, batchsizeβ • Train an initial classifier on T • While a stopping-criterion is not reached • Apply the current classifier to the pool of samples • Rank the samples in the pool using the uncertainty criterion • Select the top β examples whose certainty level fall under the selection threshold τ • Ask teacher to label the selected examples and add them to the training set • Train a new classifier on the expanded training set

Stopping criteria • Pre-specified size for the training set • Exhaustion of the pool of candidate samples • Early-stop • Implemented using the calibration set • Active selection stops if no improvements can be obtained when applying newly trained classifiers to the calibration set

Active learning

Active learning

Presentation Transcript

Active Learning

Active learning

Active Learning

Active Learning

Active Learning

Active learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active Learning

Active learning