80 likes | 94 Views
Learn about Active Learning algorithms that actively select informative data samples, enhancing learning performance. Explore Membership Queries, Stream-based and Pool-based Sampling Strategies, and Uncertainty Sampling for better data selection. Understand Conformal Prediction to complement machine learning predictions with reliability measures.
E N D
Active learning • The learning algorithm must have some control over the data from which it learns • It must be able to query an oracle, requesting for labels of data samples that seem to be most informative for the learning process • Proper selection of samples implies better performances with fewer data
Scenarios • Learning with membership queries • Stream-based sampling • Pool-based sampling
Strategies • Uncertainty sampling • Query-by-committee • Density-weighted…
Conformal prediction • Permits complementation of predictions made by machine learning algorithms with some measures of reliability • The label predicted for a new object must make it similar to the old objects • The degree of similarity is used to estimate the confidence in the prediction
Conformal prediction algorithm • Inputs: • Training sample and a test sample • Consider all possible values for the label ; • Compute nonconformity scores and p-values for each possible classification; • Predict the label corresponding to the largest p-value calculated; • Output one minus the second largest p-value as the confidence for the prediction; • Output the largest p-value calculated as the credibility of the prediction.
Nonconformity scores and p-values • Used as nonconformity scores the Lagrange multipliers computed during SVM training • Extended to a multiclass framework in a one-vs-rest approach • P-values:
Active learning algorithm • Inputs • Initial training set T, calibration set C, pool of candidate samples U • Selection tresholdτ, batchsizeβ • Train an initial classifier on T • While a stopping-criterion is not reached • Apply the current classifier to the pool of samples • Rank the samples in the pool using the uncertainty criterion • Select the top β examples whose certainty level fall under the selection threshold τ • Ask teacher to label the selected examples and add them to the training set • Train a new classifier on the expanded training set
Stopping criteria • Pre-specified size for the training set • Exhaustion of the pool of candidate samples • Early-stop • Implemented using the calibration set • Active selection stops if no improvements can be obtained when applying newly trained classifiers to the calibration set