10 likes | 188 Views
l. Knowledge Transfer via Multiple Model Local Structure Mapping. Jing Gao, Wei Fan, Jing Jiang, Jiawei Han. Observations Each base model may be effective on a subset of the test doamin. It is hard to select the optimal model since class labels in the test domain are unknown.
E N D
l Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han Observations Each base model may be effective on a subset of the test doamin. It is hard to select the optimal model since class labels in the test domain are unknown. A Synthetic Example Goal Transfer from Multiple Domains Transfer Learning Source Domain training (labeled) test (unlabeled) Source Domain training (labeled) Reuters Target Domain test (completey unlabeled) ideal setting Classifier Reuters Source Domain Classifier ? 85.5% …… Performance degrades New York Times New York Times Ng Training (haveconflicting concepts) Test realistic setting To unify knowledge that are consistent with the test domain from multiple source domains Partially overlapping Classifier Newsgroup 64.1% Approximate Optimal Weights Motivate Solution New York Times Reuters • Goal • To design learning methods that are aware of the training and test domain difference. • Examples • Spam filtering: Public email collection personal inboxes • Intrusion detection:Existing types of intrusions unknown types of intrusions • Sentiment analysis:Expert review articles blog review articles • Related work • Sample selection bias correction: Reweight training examples or transform the representation • Transfer learning:Adapt the classifier to the new domain • Multi-task learning:Share learning among different tasks • New Problems • Learn from multiple source domains and transfer the knowledge to a target domain. Importantly, target domain does not have any labeled examples (different from some previously proposed methods) Graph-based Heuristic Assumptions Map the structures of a model onto the structures of the test domain Weight each model locally according to its consistency with the neighborhood structure around the test example Test examples that are closer in the feature space are more likely to share the same class label. Framework Locally Weighted Ensemble (LWE) Determine Weights Training set 1 Optimal solution can be obtained from the regression problem if true labels are known: C1 Higher Weight Example Weight of a model is proportional to the similarity between its neighborhood graph and the clustering structure around x. Training set 2 Test example x C2 …… …… But groudtruth f is unknown!!! Training set k Local Structure Based Adjustment Ck Higher Weight What if no models are similar to the clustering structure at x? Simply means that the training information are conflicting with the true target distribution at x. Solution: Ignore the training information and propogate the labels of neighbors in the test set to x. 0.9 0.1 C1 Groundtruth Test example x 0.8 0.2 Example 0.4 0.6 C2 Experiments Experiments on Synthetic Data Experiments on Text Data Parameter Sensitivity Data Sets Synthetic Data Sets Spam Filtering: Public email collection personal inboxes (u01, u02, u03) (ECML/PKDD 2006) Text Classification: Same top-level classification problems with different sub-fields in the training and test sets (Newsgroup, Reuters) Intrusion Detection: Two types of intrusions a different type of intrusions (KDD Cup’99 Data) Baseline Methods Single models: Winnow (WNN), Logistic Regression (LRR), Support Vector Machine (SVM) Simple model averaging ensemble (SMA) Semi-supervised learning models: Transductive SVM (TSVM) Take away messages Experiments on Intrusion Data Locally weighted ensemble frameworktransfers useful knowledge from source domains and Graph-based heuristicsmakes the framework practical and effective LWE beats the baslines in terms of prediction accuracy!!! Codes and datasets available at http://ews.uiuc.edu/~jinggao3/kdd08transfer.htm Notes: