250 likes | 472 Views
Active Learning: Sampling Method. Meeting 6 — Jan 31, 2013 CSCE 6933 Rodney Nielsen. Space of Active Learning. Uncertainty Sampling. Uncertainty sampling Select examples based on confidence in prediction Least confident Margin sampling Entropy-based models.
E N D
Active Learning: Sampling Method Meeting 6 — Jan 31, 2013 CSCE 6933 Rodney Nielsen
Uncertainty Sampling • Uncertainty sampling • Select examples based on confidence in prediction • Least confident • Margin sampling • Entropy-based models
If |Y|=2, three uncertainty methods are the same • If |Y|=3, consider the following examples • 0.34, 0.33, 0.33 • 0.50, 0.50, 0.00 • 0.50, 0.49, 0.01 • 0.40, 0.30, 0.30 • 0.41, 0.40, 0.19
Query by Committee • Train a committee of hypotheses • Representing different regions of the version space • Obtain some measure of (dis)agreement on the instances in the dataset (e.g., vote entropy) • Assume the most informative instance is the one on which the committee has the most disagreement • Goal: minimize the version space • No agreement on size of committee, but even 2-3 provides good results
Expected Model Change • Query the instance that would result in the largest expected change in h based on the current model and Expectations • E.g., the instance that would result in the largest gradient descent in the model parameters • Prefer the instance x that leads to the most significant change in the model
Expected Model Change • What learning algorithms does this work for • What are the issues • Can be computationally expensive for large datasets and feature spaces • Can be led astray if features aren’t properly scaled • How do you properly scale the features?
Admin • IR / Thursday’s meeting time
ML Publication Venues • ML Journals • Machine Learning • Journal of Machine Learning Research • ML Conferences • NIPS – Neural Information Processing • ICML – International Conference on ML • ECML • IROS – Intl Conf on Intelligent Robots and Systems • ICPR – Intl Conference on Pattern Recognition • ISNN – Intl Symposium on Neural Networds • COLT – Computational Learning Theory • UAI – Uncertainty in Artificial Intelligence (AI) • AAAI – Association for Advancement of AI • IJCAI – International Joint Conference on AI • FLAIRS – Conference of the AI Research Society
NLP Publication Venues • NLP Journals • Computational Linguistics • JNLE – Journal of Natural Language Engineering • Language Resources and Evaluation • NLP Conferences • ACL / NAACL / EACL / PAACL • ICASP • CoLing • HLT • LREC • EMNLP • Interspeech
Projects • Set up meeting with me next week to discuss possible projects • Come prepared to discuss the concept you are most interested in pursuing (not the implementation details, just the high-level description) • Or if you don’t have a specific goal, send me an email describing your general interests
Reading Responses • Skip this coming Monday/Tuesday reading response
Estimated Error Reduction • Other models approximate the goal of minimizing future error by minimizing (e.g., uncertainty, variance, …) • Estimated Error Reduction attempts to directly minimize E[error]
Estimated Error Reduction • Often computationally prohibitive • Binary logistic regression would be O(|U||L|G) • Where G is the number of gradient descent iterations to convergence • Conditional Random Fields would be O(T|Y|T+2|U||L|G) • Where T is the number of instances in the sequence
Variance Reduction • Regression problems • E[error2] = noise + bias + variance: • Learner can’t change noise or bias so minimize variance • Fisher Information Ratio used for classification
Outlier Phenomenon • Uncertainty sampling and Query by Committee might be hindered by querying many outliers
Density Weighted Methods • Uncertainty sampling and Query by Committee might be hindered by querying many outliers • Density weighted methods overcome this potential problem by also considering whether the example is representative of the input dist. • Tends to work better than any of the base classifiers on their own
Diversity • Naïve selection by earlier methods results in selecting examples that are very similar • Must factor this in and look for diversity in the queries
Active Learning Empirical Results • Appears to work well, barring publication bias From Settles, 2009
Labeling Costs • Are all labels created equal? • Generating labels by experiments • Some instances easier to label (eg, shorter sents) • Can pre-label data for a small savings • Experimental problems • Value of information (VOI) • Considers labeling & estmtd misclassification costs • Critical to goal of Active Learning • Divide informativeness by cost?
Questions • ???