180 likes | 344 Views
Actively Transfer Domain Knowledge. Transfer when you can, otherwise ask and don’t stretch it. Xiaoxiao Shi † Wei Fan ‡ Jiangtao Ren † † Sun Yat-sen University ‡ IBM T. J. Watson Research Center. Standard Supervised Learning. training (labeled). test (unlabeled). Classifier.
E N D
Actively Transfer Domain Knowledge Transfer when you can, otherwise ask and don’t stretch it Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren† †Sun Yat-sen University ‡IBM T. J. Watson Research Center
Standard Supervised Learning training (labeled) test (unlabeled) Classifier 85.5% New York Times New York Times
In Reality…… How to improve the performance? training (labeled) test (unlabeled) 47.3% Labeled data are insufficient! New York Times New York Times
Solution I : Active Learning training (labeled) test (unlabeled) Classifier 83.4% New York Times New York Times $ Label LabelingCost Domain Expert
Solution II : Transfer Learning Out-of-domain training (labeled) In-domain test (unlabeled) Transfer Classifier 82.6%?? 43.5% New York Times Reuters Significant Differences No guarantee transfer learning could help! Accuracy drops
Motivation Both have disadvantages, what to choose? • Active Learning: • Labeling cost • Transfer Learning: • Domain difference risk
Test Unlabeled in-domain Training Data Proposed Solution (AcTraK) Active Learner choose Classifier ? ? Reliable, label by the classifier Classification Result Transfer Classifier Decision Function Labeled Training Unreliable out-domain training (labeled) Label Domain Expert Reuters
ML+ Train mapping Transfer Mo L+ Mo In-domain Label +/L+ +/L- L- ML- In-domain labeled (very few) -/L+ -/L- Train Transfer Classifier Train Train In-domain labeled (few) Train L+ = { (x,y=+/-)|Mo(x)=‘L+’ } the true in-domain label may be either‘-’ or ‘+’ P(+|X, ML+) ML+ + L+ Out-of-domain dataset (labeled) P(L+|X, Mo) Mo P(+|X, ML-) L- X: In-domain unlabeled ML- - P(L-|X, Mo) • Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo). • Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-). • Then the probability for X to be “+” is: • T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)
Our Solution (AcTraK) unlabeled Training Data Test Active Learner Classifier ? ? Reliable, label by the classifier Classification Result Transfer Classifier Decision Function Labeled Training Unreliable outdomain training (labeled) Label Domain Expert 9 Reuters
Decision Function ? Transfer Classifier • In the following, ask the domain expert to label the instance, not the transfer classifier: when prediction by transfer classifier is unreliable, ask domain experts a) Conflict b) Low in confidence c) Few labeled in-domain examples
Decision Function T(x): prediction by the transfer classifier ML(x): prediction given by the in-domain classifier b) Confidence? c) Size? a) Conflict? AcTraK asks the domain expert to label the instance with probability of Decision Function: Label by Transfer Classifier Label by Domain Expert R : random number [0,1]
Properties • It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded. • It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.
Theorems Maximum size expected error of the transfer classifier
Experiments setup • Data Sets • Synthetic data sets • Remote Sensing: data collected from regions with a specific ground surface condition data collected from a new region • Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup) • Comparable Models • Inductive Learning model: AdaBoost, SVM • Transfer Learning model: TrAdaBoost (ICML’07) • Active Learning model: ERS (ICML’01)
Experiments on Synthetic Datasets In-domain: 2 labeled training & testing 4 out domain labeled training
Experiments on Real World Dataset • Evaluation metric: • Compared with transfer learning on accuracy. • Compared with active learning onIEA (Integral Evaluation on Accuracy).
20 Newsgroup • comparison with active learner ERS 1. Comparison with Transfer Learner 2. Comparison with Active Learner
Conclusions • Actively Transfer Domain Knowledge • Reduce domain difference risk: transfer useful knowledge (Theorem 2) • Reduce labeling cost: query domain experts only when necessary (Theorem 3)