Active Learning in Text Retrieval

Active Learning in Text Retrieval

Introduction • Passive Learning vs. Active Learning • Active Learning: Intelligently choose good questions to reach high performance using as few examples as possible

Learning conjunctions • Protocol I: teacher proposes questions to learner • Protocol II: learner randomly choose questions • Protocol III: learner proposes questions to teacher

Active Learning in HARD Track • Think text retrieval as a classification problem • HARD Track permit participants to ask several questions (Clarification Form) • Research Problem: what kind of question to ask?

Baseline: Random Sampling • Randomly choose unlabeled samples • Incorporate new labeled example to retrain a new classifier • Not efficient ! There are already some clues to choose unlabeled example(s)

Relevance Feedback • A kind of active learning • Let the user to label top ranked retrieved results • Is it optimal?

Uncertainty Sampling [SIGIR94] • Create an initial classifier • When teacher is willing to label examples • Apply the current classifier to each unlabeled example • Find the b examples for which the classifier is least certain of class membership • Have the teacher label the subsample of b examples • Train a new classifier on all labeled examples

A Probabilistic Text Classifier • Logistic regression to P(C|x)

Comment • Choose most uncertain unlabeled example (reduce version space?) vs. examples that can minimize future error • Several samples one time vs. A sample one time • Incremental Training - Computation Issue • Sequential process vs. Two times process (HARD)

Query By Committee (QBC) • QBC [COLT1992, NIPS 1992] • Generate a committee of classifiers, and next query is chosen by the principle of maximal disagreement among these classifier [COLT 1992]. • The effect of training a set of examples can be achieved for the cost of getting corresponding examples that are not yet labeled and then labeling logarithmic fraction of them

Active Learning with Statistical Models • Cohn et al [JAIR 1996] • Provide a statistically optimal solution: selects training example that once labeled and added to the training data, is expected to result in the lowest error on future test examples • This optimal solution can’t be efficiently found

Sampling Estimation • Roy, McCallum [NIPS 2002] • Using sampling estimation of error reduction to reach optimal active learning

Active Learning Framework for CBIR • Zhang [IEEE Transaction of Multimedia 2002] • CBIR: weighted sum of semantic distance and low-level feature distance. Semantic Feature is Attributes annotated. • Examples to be annotated are what system is the most uncertain of • Biased Kernel Regression, Entropy as Uncertainty

Active Learning in Text Retrieval

Active Learning in Text Retrieval

Presentation Transcript

Introduction to Text Retrieval

Text Based Information Retrieval - Text Mining

Information Retrieval in Text Part I

Active Participation in Learning

Support Vector Machine Active Learning for Image Retrieval

An Active Learning Framework for Content-Based Information Retrieval

Active learning

Active Learning

Effective Multi-Label Active Learning for Text Classification

Visualization in Text Information Retrieval

SVM Active Learning For Image Retrieval

Information Retrieval in Text Part III

Conventional Text-Retrieval Systems

Conventional Text-Retrieval Systems

Text-retrieval Systems

ACTIVE LEARNING FOR TEXT CLASSIFICATION

Active Learning in POMDPs

Structured Text Retrieval Models

Structured Prediction and Active Learning for Information Retrieval

Conventional Text-Retrieval Systems

Text retrieval systems