190 likes | 317 Views
Paired Sampling in Density-Sensitive Active Learning. Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University. Outline. Problem setting Motivation
E N D
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University
Outline • Problem setting • Motivation • Our approach • Experiments • Conclusion
Setting • X: feature space, label set Y={-1,+1} • Data D ~ X x Y • D = T U U • T: training set U: unlabeled set • T is small initially, U is large • Active Learning: • Choose most informative samples to label • Goal: high performance with least number of labeling requests
Motivation • Optimize the decision boundary placement • Sampling disproportionately on one side may not be optimal • Maximize likelihood of straddling the boundary with paired samples • Three factors affect sampling • Local density • Conditional entropy maximization • Utility score
Illustrative Example Paired sampling Single point sampling • Left Figure • significant shift in the current hypothesis • large reduction in version space • Right Figure • small shift in the current hypothesis • small reduction in version space
Density-Sensitive Distance • Cluster Hypothesis: • decision boundary should NOT cut clusters • squeeze distances in high density regions • increase distances in low density regions • Solution: Density-Sensitive Distance • find the weakest link along each path in a graph G • a better way to avoid outliers (i.e. a very short edge in a long path) Chapelle & Zien (2005)
Density-Sensitive Distance • Apply MDS (Multi-dimensional Scaling) to to obtain a Euclidean embedding • Find eigenvalues and eigenvectors of • Pick the first p eigenvectors s.t.
Active Sampling Procedure • Given a training set T in MDS space • Train logistic regression classifier on T • For all • Compute the pairwise score • Choose the pair with the maximum score • Repeat 1-3
Details of the Scoring Function S • Two components of S • Likelihood of a pair having opposite labels (straddling the decision boundary) • Utility of the pair • By cluster assumption • decision boundary should not clusters => points in different clusters are likely to have different labels • In the transformed space, points in different clusters have low similarity (large distance) Thus, we can estimate
An Analysis Justifying our Claim • Pairwise distances are divided into bins • Pairs are assigned to bins acc. to their distances • For each bin, relative frequency of pairs with opposite class labels are computed • This graph (empirically) shows that likelihood of having opposite labels for two points monotonically increases with the pairwise distance between them. * This graph is plotted on g50c dataset.
Utility Function • Two components • Local density depends on • number of close neighbors • their proximity • Conditional Entropy • For binary problems
Uncertainty-Weighed Density • captures • the density of a given point • information content of its neighbors • novelty: • each neighbor’s contribution weighed by its uncertainty • reduces the effect of highly certain neighbors • dense points with highly uncertain neighbors become important
Utility Function • utility of a pair is • regularize • information content (entropy) of the pair • proximity-weighted information content of neighbors
Experimental Data • pair with maximum score selected • Six binary datasets
Experiment Setting • For each data set • start with 2 labeled data points (1 +, 1 -) • run each method for 20 iterations • results averaged over 10 runs • Baselines • Uncertainty Sampling • Density-only Sampling • Representative Sampling (Xu et. al. 2003) • Random Sampling
Conclusion • Our contributions: • combine uncertainty, density, and dissimilarity across decision boundary • proximity-weighted conditional entropy selection is effective for active learning • Results show • our method significantly outperforms baselines in • error reduction • fewer labeling requests than others to achieve the same performance