140 likes | 148 Views
Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling. by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science, Carnegie Mellon University KDD ’09 June 30 th 2009 Paris, France. Problem Illustration. oracles. 0.69. instances. 0.9. 0.58. 0.55.
E N D
Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science, Carnegie Mellon University KDD ’09 June 30th 2009 Paris, France
Problem Illustration oracles 0.69 instances 0.9 0.58 0.55 0.67 0.83 0.8 0.74
Interval Estimate Threshold (IEThresh) • Goal: find the labeler(s) with the highest expected accuracy • Our work builds upon Interval Estimation [L. P. Kaelbling] • Estimate the reward of each labeler (more on next slide) • Compute upper confidence interval for the labelers • Select labelers with upper interval higher than a threshold • Observe the output of the chosen oracles to estimate their reward • Repeat to step 1 • filter out unreliable labelers • reduce labeling cost
Reward of the labelers • The reward of each labeler is unknown => need to be estimated • reward of a labeler eliciting true label • true label is also unknown => estimated by the majority vote • We propose the below reward function reward=1 if the labeler agrees with the majority label reward=0 otherwise
IEThresh at the Beginning Expected reward increases Oracles
IEThresh Oracle Selection Expected reward Threshold increases 2 3 1 4 5 Oracles
IE Learning Snapshot II Expected reward Threshold increases 4 2 3 5 1 Oracles
1 2 5 4 3 IEThresh Instance Selection
Uniform Expert Accuracy є (0.5,1] Classification error Repeated Labeling [Sheng et al, 2008]: querying all experts for labeling
# Oracle Queries vs. Accuracy : First 10 iterations : Next 40 iterations : Next 100 iterations
# Oracle queries to reach a target accuracy better skew increases
Results on AMT Data with Human Annotators • IEThresh reaches the best performance with similar effort to Repeated labeling • Repeated baseline needs 840 queries total to reach 0.95 accuracy 5 annotators 6 annotators Dataset at http://nlpannotations.googlepages.com/ made available by [Snow et al., 2008]
Conclusions and Future Work • Conclusions • IEThresh is effective in balancing exploration vs. exploitation tradeoff • Early filtering of unreliable labelers boosts performance • Utilizing labeler accuracy estimates is more effective than asking all or randomly • Future Work • from consistent to time-variant labeler quality • label noise conditioned on the data instance • correlated labeling errors