Active learning Query Strategies

Active learning Query Strategies September 27 2010

Outline • Previous lecture: • Uncertainty Sampling • Density Weighted Methods • This lecture: • Query-By-Committee • Expected Model Change • Expected Error Reduction • Variance Reduction

Query-By-Committee • Idea: Limit the size of the version space • Maintain a committee of models (ex: set of classifiers) trained on the same labeled dataset • Each model represents different region of the version space • Each member of the committee cast a vote on every query instance • Pick the one that they most disagree

Query-By-Committee Region of disagreement • Pick unlabeled instances within the disagreement region

Query-By-Committee • How to measure the disagreement • Vote Entropy • KL Divergence # of votes that the label receives Consensus probability that the label is correct

Expected Model Change • Pick the instance that would change the current model the most, if its label is known. • How to measure change in the model? • Expected Gradient Length (EGL) • Gradient of the objective function • Assuming training is converged in previous iteration ~

Expected Error Reduction • Pick the instance that reduces the expected generalization error • Minimize the expectation of the loss function • L0 (0/1 loss) • Log loss

Expected Error Reduction • L0 (0/1 loss) New parameters after retraining • Log loss

Discussion • Task: Gene prediction using CRFs • Feature space • Set of labelings • Unlabeled pool is large

Discussion

Variance Reduction • In expected error reduction, closed form of the expectation of the loss function is available for models such as Gaussian random models. • What if it is not available? • Minimize the risk, by minimizing the output variance. Risk = (some noise) + (model bias) + (output variance) Model independent If the model class is fixed, it’s invariant Minimize this term

Variance Reduction Gradient of the predicted output with respect to model parameters Inverse of the Fischer Information matrix

Variance Reduction • Fischer information matrix is the partial derivative of the log likelihood with respect to model parameters • Measures the effect of model parameters on the objective function • Maximize Fisher information matrix: pick the parameter values that will change the model the most • Equivalent to minimizing its inverse, which is equivalent to minimizing the variance

Discussion: Variance Reduction • When parameter space is large? • Unlabeled pool is large • Unlabeled pool is unbalanced

Discussion • Which point would you pick? A or B? • Which of the following sampling strategies would you pick? • Expected Model Change • Expected Error Reduction • Variance Reduction • How about QBC and uncertainty sampling? B A

References 1. B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison. 2009.

Active learning Query Strategies

Active learning Query Strategies

Presentation Transcript

Active Learning and Inquiry-Based Teaching Strategies

Active Learning Strategies and Techniques

Query Optimization Strategies

Query Optimization Strategies

Active Learning Strategies

Issues in Adapting Active Learning Strategies

Deepening Student Engagement with Active Learning Strategies

Active learning

Active Learning Strategies

Active Learning

Active Learning Strategies to Promote Critical Thinking

Active Learning Strategies

Active Learning

ACTIVE LEARNING STRATEGIES

Social Studies Active Learning Strategies

Active Cost-sensitive Learning (Intelligent Test Strategies)

Classroom Strategies for Active Learning

Active Learning Strategies for Compound Screening

Active Learning Strategies for Large Geoscience Classes

Active Learning Strategies

Active Teaching and Active Learning: Techniques and Strategies for Instructors