160 likes | 247 Views
Knowledge Transfer via Multiple Model Local Structure Mapping. Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center. Standard Supervised Learning. training (labeled). test (unlabeled). Classifier. 85.5%. New York Times.
E N D
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center
Standard Supervised Learning training (labeled) test (unlabeled) Classifier 85.5% New York Times New York Times
In Reality…… training (labeled) test (unlabeled) Classifier 64.1% Labeled data not available! Reuters New York Times New York Times
Domain Difference Performance Drop train test ideal setting Classifier NYT NYT 85.5% New York Times New York Times realistic setting Classifier NYT Reuters 64.1% Reuters New York Times
Other Examples • Spam filtering • Public email collection personal inboxes • Intrusion detection • Existing types of intrusions unknown types of intrusions • Sentiment analysis • Expert review articles blog review articles • The aim • To design learning methods that are aware of the training and test domain difference • Transfer learning • Adapt the classifiers learnt from the source domain to the new domain
All Sources of Labeled Information test (completely unlabeled) training (labeled) Reuters Classifier ? …… New York Times Newsgroup
A Synthetic Example Training (have conflicting concepts) Test Partially overlapping
Goal Source Domain Source Domain Target Domain Source Domain • To unify knowledge that are consistent with the test domain from multiple source domains
Summary of Contributions • Transfer from multiple source domains • Target domain has no labeled examples • Do not need to re-train • Rely on base models trained from each domain • The base models are not necessarily developed for transfer learning applications
Locally Weighted Ensemble Training set 1 C1 X-feature value y-class label Training set 2 C2 Test example x …… …… Training set k Ck
Optimal Local Weights Higher Weight 0.9 0.1 C1 Test example x 0.8 0.2 0.4 0.6 C2 • Optimal weights • Solution to a regression problem • Impossible to get since f is unknown!
Graph-based Heuristics Higher Weight • Graph-based weights approximation • Map the structures of a model onto the structures of the test domain • Weight of a model is proportional to the similarity between its neighborhood graph and the clustering structure around x.
Experiments Setup • Data Sets • Synthetic data sets • Spam filtering: public email collection personal inboxes (u01, u02, u03) (ECML/PKDD 2006) • Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup, Reuters) • Intrusion detection data: different types of intrusions in training and test sets. • Baseline Methods • One source domain: single models (WNN, LR, SVM) • Multiple source domains: SVM on each of the domains • Merge all source domains into one: ALL • Simple averaging ensemble: SMA • Locally weighted ensemble: LWE
Conclusions • Locally weighted ensemble framework • transfer useful knowledge from multiple source domains • Graph-based heuristics to compute weights • Make the framework practical and effective