210 likes | 648 Views
Active learning Query Strategies. September 27 2010. Outline. Previous lecture: Uncertainty Sampling Density Weighted Methods This lecture: Query-By-Committee Expected Model Change Expected Error Reduction Variance Reduction . Query-By-Committee.
E N D
Active learning Query Strategies September 27 2010
Outline • Previous lecture: • Uncertainty Sampling • Density Weighted Methods • This lecture: • Query-By-Committee • Expected Model Change • Expected Error Reduction • Variance Reduction
Query-By-Committee • Idea: Limit the size of the version space • Maintain a committee of models (ex: set of classifiers) trained on the same labeled dataset • Each model represents different region of the version space • Each member of the committee cast a vote on every query instance • Pick the one that they most disagree
Query-By-Committee Region of disagreement • Pick unlabeled instances within the disagreement region
Query-By-Committee • How to measure the disagreement • Vote Entropy • KL Divergence # of votes that the label receives Consensus probability that the label is correct
Expected Model Change • Pick the instance that would change the current model the most, if its label is known. • How to measure change in the model? • Expected Gradient Length (EGL) • Gradient of the objective function • Assuming training is converged in previous iteration ~
Expected Error Reduction • Pick the instance that reduces the expected generalization error • Minimize the expectation of the loss function • L0 (0/1 loss) • Log loss
Expected Error Reduction • L0 (0/1 loss) New parameters after retraining • Log loss
Discussion • Task: Gene prediction using CRFs • Feature space • Set of labelings • Unlabeled pool is large
Variance Reduction • In expected error reduction, closed form of the expectation of the loss function is available for models such as Gaussian random models. • What if it is not available? • Minimize the risk, by minimizing the output variance. Risk = (some noise) + (model bias) + (output variance) Model independent If the model class is fixed, it’s invariant Minimize this term
Variance Reduction Gradient of the predicted output with respect to model parameters Inverse of the Fischer Information matrix
Variance Reduction • Fischer information matrix is the partial derivative of the log likelihood with respect to model parameters • Measures the effect of model parameters on the objective function • Maximize Fisher information matrix: pick the parameter values that will change the model the most • Equivalent to minimizing its inverse, which is equivalent to minimizing the variance
Discussion: Variance Reduction • When parameter space is large? • Unlabeled pool is large • Unlabeled pool is unbalanced
Discussion • Which point would you pick? A or B? • Which of the following sampling strategies would you pick? • Expected Model Change • Expected Error Reduction • Variance Reduction • How about QBC and uncertainty sampling? B A
References 1. B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison. 2009.