1 / 23

Relaxed Transfer of Different Classes via Spectral Partition

Relaxed Transfer of Different Classes via Spectral Partition. Unsupervised Can use data with different classes to help. How so?. Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center

teresa
Download Presentation

Relaxed Transfer of Different Classes via Spectral Partition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relaxed Transfer of Different Classes via Spectral Partition • Unsupervised • Can use data with different classes to help. How so? Xiaoxiao Shi1Wei Fan2Qiang Yang3 Jiangtao Ren4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center 3Hong Kong University of Science and Technology 4 Sun Yat-sen University

  2. What is Transfer Learning? Standard Supervised Learning training (labeled) test (unlabeled) Classifier 85.5% New York Times New York Times 2

  3. What is Transfer Learning? How to improve the performance? In Reality… training (labeled) test (unlabeled) 47.3% Labeled data are insufficient! New York Times New York Times 3

  4. What is Transfer Learning? Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier 82.6% New York Times Reuters Not necessary from the same domain and do not follow the same distribution 4

  5. Transfer across Different Class Labels Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier 82.6% New York Times Reuters Labels: WorldU. S.Fashion StyleTravel…… Labels: MarketsPoliticsEntertainmentBlogs…… How to transfer when class labels are different? in number and meaning Since they are from different domains, they may have different classlabels!

  6. Two Main Categories of Transfer Learning • Unsupervised Transfer Learning • Do not have any labeled data from the target domain. • Use source domain to help learning. • Question: is it better than clustering? • Supervised Transfer Learning • Have limited number of labeled examples from target domain • Is it better than not using any source data example?

  7. Transfer across Different Class Labels • Two sub-problems: • (1) What and how to transfer, since we can not explicitly use P(x|y) or P(y|x) to build the similarity among tasks (class labels ‘y’ have different meanings)? • (2) How to avoid negative transfer since the tasks may be from very different domains? Negative Transfer: when the tasks are too different, transfer learning may hurt learning accuracy.

  8. Dataset exhibits complex cluster shapes • K-means performs very poorly in this space due bias toward dense spherical clusters. Eigenspace: space expended by a set of eigen vectors. In the eigenspace (space given by the eigenvectors), clusters are trivial to separate. -- Spectral Clustering The proposed solution • (1) What and How to transfer? • Transfer the eigensapce

  9. The proposed solution • To get the Clustering-based KL divergence: • Perform Clustering on the combined dataset. • Calculate the KL divergence by some basic statistical properties of the clusters. See Example. • (2) How to avoid negative transfer? • A new clustering-based KL Divergence to reflect distribution differences. • If distributions are too different (KL is large), automatically decrease the effect from source domain. Traditional KL Divergence Need to solve P(x), Q(x) for every x, which is normally difficult to obtain.

  10. the portion of examples in Q that are contained in cluster C2 the portion of examples in P that are contained in cluster C2 the portion of examples in Q that are contained in cluster C1 the portion of examples in P that are contained in cluster C1 An Example E(P)=8/15E(Q)=7/15 Q P For example, S(P’, C) means “the portion of examples in P that are contained in cluster C ”. C2 S(P’, C1) S(Q’, C1) S(P’, C2) S(Q’, C2) = 0.5 Clustering = 0.5 C1 =5/9 P’(C1)=3/15Q’(C1)=3/15P’(C2)=5/15Q’(C2)=4/15 CombinedDataset =4/9 KL=0.0309

  11. Objective Function • Objective: Find an eigenspace that well separates the target data • Intuition: If the source data is similar to the target data, make good use of the source eigenspace; • Otherwise, keep the original structure of the target data. Prefer Source Eigenspace Prefer OriginalStructure TraditionalNormalized Cut Penalty Term Balanced by R(L; U) More similar of distributions, less is R(L; U), more the function will rely on source eigenspace TL

  12. How to construct constraint TL and Tu? • Principle: • To construct TL --- it is directly derived from the “must-link” constraint (the examples with the same label should be together). • To construct TU --- (1) Perform standard spectral clustering (e.g., Ncut) on U. (2) the examples in the same cluster should be together. 4 1, 2, 4 should be together (blue); 3, 5, 6 should be together (red) 1 3 5 2 6 4 1, 2, 3 should be together; 4, 5, 6 should be together 1 3 5 2 6

  13. How to construct constraint TL and Tu? • Construct the constraint matrix M=[m1, m2, …, mr]’ T 1, -1, 0, 0, 0, 0 1, 0, 0, -1, 0, 0 0, 0, 1, 0, -1, 0 …… 1 and 2 For example, 1 and 4 ML = 4 3 and 5 1 3 5 2 6

  14. Experiment Data sets 15

  15. Experiment data sets

  16. Text Classification Comp1 VSRec1 1: comp2 VS Rec2 2: 4 classes (Graphics, etc) 3: 3 classes (crypt, etc) Org1VSPeople1 1: org2 VS People2 2: 3 classes (Places, etc) 3: 3 classes (crypt, etc)

  17. Image Classification HomerVSReal Bear 1: Superman VS Teddy 2: 3 classes (cartman, etc) 3: 4 classes (laptop, etc) CartmanVSFern 1: Superman VS Bonsai 2: 3 classes (homer, etc) 3: 4 classes (laptop, etc)

  18. Parameter Sensitivity

  19. Conclusions • Problem: Transfer across tasks with different class labels • Two sub-problems: • (1) What and How to transfer? • Transfer the eigenspace. • (2) How to avoid negative transfer? • Propose an effective clustering-based KL Divergence; if KL is large, or distributions are too different, decrease the effect from source domain. 20

  20. Thanks! Datasets and codes: http://www.cs.columbia.edu/~wfan/software.htm 21

  21. # Clusters? Condition for Lemma 1 to be valid: In each cluster, the expected values of the target and source data are about the same. If > where is close to 0. Adaptively Control the #Clusters to guarantee Lemma 1 valid!--Stop bisecting clustering when there is only target/source data in the cluster, or

  22. Optimization Let Then, Algorithm flow

More Related