400 likes | 592 Views
Active Learning: Class Questions. Meeting 15 — Mar 5, 2013 CSCE 6933 Rodney Nielsen. Space of Active Learning. Space of Active Learning. Active Learning Query Types. Your Questions. Is the query strategy framework decided by the algorithm we apply for a problem?. Your Questions.
E N D
Active Learning: Class Questions Meeting 15 — Mar 5, 2013 CSCE 6933 Rodney Nielsen
Your Questions • Is the query strategy framework decided by the algorithm we apply for a problem?
Your Questions • To the Membership Query Synthesis, how does it deal with the influence of data distribution? For example, if we applied it in SVM algorithm, when we use slack variables in soft margin, the distribution will have a great influence on the results.
Membership Query Synthesis • Dynamically construct query instances based on expected informativeness • Applications • Character recognition. • Robot scientist: find optimal growth medium for a yeast • 3x $ decrease vs. cheapest next • 100x $ decrease vs. random selection
Stream-based Selective Sampling • Informativeness measure • Region of uncertainty / Version space • Applications • POST • Sensor scheduling • IR ranking • WSD
Pool-based Active Learning • Informativeness measure • Applications • Cancer diagnosis • Text classification • IE • Image classfctn & retrieval • Video classfctn & retrieval • Speech recognition
Your Questions • Which is the most effective setting among the three for active learning? What parameters would decide that?
Questions • Questions???
Paper Selection • Based on reading the paper abstracts, find the single paper you would most like to read on Active Learning • Any length paper that looks really interesting • Email to me by Friday
Uncertainty Sampling • Uncertainty sampling • Select examples based on confidence in prediction • Least confident • Margin sampling • Entropy-based models
Query by Committee • Train a committee of hypotheses • Representing different regions of the version space • Obtain some measure of (dis)agreement on the instances in the dataset (e.g., vote entropy) • Assume the most informative instance is the one on which the committee has the most disagreement • Goal: minimize the version space • No agreement on size of committee, but even 2-3 provides good results
Expected Model Change • Query the instance that would result in the largest expected change in h based on the current model and Expectations • E.g., the instance that would result in the largest gradient descent in the model parameters • Prefer the instance x that leads to the most significant change in the model
Variance Reduction • Regression problems • E[error2] = noise + bias + variance: • Learner can’t change noise or bias so minimize variance • Fisher Information Ratio used for classification
Estimated Error Reduction • Other models approximate the goal of minimizing future error by minimizing (e.g., uncertainty, variance, …) • Estimated Error Reduction attempts to directly minimize E[error]
Density Weighted Methods • Uncertainty sampling and Query by Committee might be hindered by querying many outliers • Density weighted methods overcome this potential problem by also considering whether the example is representative of the input dist. • Tends to work better than any of the base classifiers on their own
Your Questions • The Density-Weighted Methods can help to avoid the negative influence of the noise point, but how to deal with the outliers? Just ignore this part?
Your Questions • Brief us whether the density-weighted approach can be applied to all applications? • If this strategy can be used in all applications, why most of the research is being done using uncertainty sampling (esp. entropy)?
Questions • ???
Your Questions • How to deal with over fitting when applying selective sampling?
Diversity • Naïve selection by earlier methods results in selecting examples that are very similar • Must factor this in and look for diversity in the queries
Questions • ???
Your Questions • There exists huge amount of unlabeled data, Is there a chance where all the data is labeled? • If not, How long does the learner query? • Is there any stage where learner is sufficient with labels and stops querying??
Your Questions • How to stop the iteration? Stop when the accuracy changes less than a certain threshold?
Early Stopping • A theoretically sound method to stop training is when the examples in the margin are exhausted. • To check if there are still unseen training instances in the margin, the distance of the new selected instance is compared to the support vectors of the current model. • If the new selected instance by active learning (closest to the hyperplane) is not closer than any of the support vectors, we conclude that the margin is exhausted. • A practical implementation of this idea is to count the number of support vectors during the active learning training process. • If the number of the support vectors stabilizes, it implies that all possible support vectors have been selected by the active learning method.
Questions • ???
Your Questions • What is log-loss? • What is Loss? • Related terms: • Risk • Regret • Common Loss functions: • Absolute Loss: | err | • Squared Loss: err2 • Pro: differentiable • Con: can be skewed by a few large values
Your Questions • What is the relationship between entropy and log-loss?
Your Questions • What is the relationship between entropy and log-loss?
Your Questions • What is the difference between information-theoretic learning and decision-theoretic learning?
Your Questions • What is the relationship between entropy and log-loss?
Questions • ???