1 / 42

Semi-supervised Learning

Semi-supervised Learning. Rong Jin. Semi-supervised learning. Label propagation Transductive learning Co-training Active learning. Label Propagation. A toy problem Each node in the graph is an example Two examples are labeled Most examples are unlabeled

cheree
Download Presentation

Semi-supervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-supervised Learning Rong Jin

  2. Semi-supervised learning • Label propagation • Transductive learning • Co-training • Active learning

  3. Label Propagation • A toy problem • Each node in the graph is an example • Two examples are labeled • Most examples are unlabeled • Compute the similarity between examples Sij • Connect examples to their most similar examples • How to predicate labels for unlabeled nodes using this graph? Two labeled examples wij Unlabeled example

  4. Label Propagation • Forward propagation

  5. Label Propagation • Forward propagation • Forward propagation

  6. Label Propagation • Forward propagation • Forward propagation • Forward propagation • How to resolve conflicting cases What label should be given to this node ?

  7. Label Propagation • Let S be the similarity matrix S=[Si,j]nxn • Let D be a diagonal matrix where Di = åi ¹ jSi,j • Compute normalized similarity matrix S’ S’=D-1/2SD-1/2 • Let Y be the initial assignment of class labels • Yi = 1 when the i-th node is assigned to the positive class • Yi = -1 when the i-th node is assigned to the negative class • Yi = 0 when the I-th node is not initially labeled • Let F be the predicted class labels • The i-th node is assigned to the positive class if Fi >0 • The i-th node is assigned to the negative class if Fi < 0

  8. Label Propagation • Let S be the similarity matrix S=[Si,j]nxn • Let D be a diagonal matrix where Di = åi ¹ jSi,j • Compute normalized similarity matrix S’ S’=D-1/2SD-1/2 • Let Y be the initial assignment of class labels • Yi = 1 when the i-th node is assigned to the positive class • Yi = -1 when the i-th node is assigned to the negative class • Yi = 0 when the i-th node is not initially labeled • Let F be the predicted class labels • The i-th node is assigned to the positive class if Fi >0 • The i-th node is assigned to the negative class if Fi < 0

  9. Label Propagation • One iteration • F = Y + aS’Y = (I + aS’)Y • a weights the propagation values • Two iteration • F =Y + aS’Y + a2S’2Y = (I + aS’ + a2S’2)Y • How about the infinite iteration F = (ån=01anS’n)Y = (I - aS’)-1Y • Any problems with such an approach?

  10. Label Consistency Problem • Predicted vector F may not be consistent with the initially assigned class labels Y

  11. Energy Minimization • Using the same notation • Si,j: similarity between the I-th node and j-th node • Y: initially assigned class labels • F: predicted class labels • Energy: E(F) = åi,jSi,j(Fi – Fj)2 • Goal: find label assignment F that is consistent with labeled examples Y and meanwhile minimizes the energy function E(F)

  12. Harmonic Function • E(F) = åi,jSi,j (Fi – Fj)2 = FT(D-S)F • Thus, the minimizer for E(F) should be (D-S)F = 0, and meanwhile F should be consistent with Y. • FT = (FlT, FuT), YT = (YlT, YuT) • Fl = Yl

  13. 2 1 • Create a graph for images of digit letters Optical Character Recognition • Given an image of a digit letter, determine its value

  14. Optical Character Recognition • #Labeled_Examples+#Unlabeled_Examples = 4000 • CMN: label propagation • 1NN: for each unlabeled example, using the label of its closest neighbor

  15. Spectral Graph Transducer • Problem with harmonic function • Why this could happen ? • The condition (D-S)F = 0 does not hold for constrained cases

  16. Spectral Graph Transducer • Problem with harmonic function • Why this could happen ? • The condition (D-S)F = 0 does not hold for constrained cases

  17. Spectral Graph Transducer minF FTLF + c (F-Y)TC(F-Y) s.t. FTF=n, FTe = 0 • C is the diagonal cost matrix, Ci,i = 1 if the i-th node is initially labeled, zero otherwise • Parameter c controls the balance between the consistency requirement and the requirement of energy minimization • Can be solved efficiently through the computation of eigenvector

  18. Empirical Studies

  19. Green’s Function • The problem of minimizing energy and meanwhile being consistent with initially assigned class labels can be formulated into Green’s function problem • Minimizing E(F) = FTLF  LF = 0 • Turns out L can be viewed as Laplacian operator in the discrete case • LF = 0  r2F=0 • Thus, our problem is find solution F r2F=0, s.t. F = Y for labeled examples • We can treat the constraint that F = Y for labeled examples as boundary condition (Von Neumann boundary condition) • A standard Green function problem

  20. Why Energy Minimization? Final classification results

  21. Cluster Assumption • Cluster assumption • Decision boundary should pass low density area • Unlabeled data provide more accurate estimation of local density

  22. denotes +1 denotes -1 Cluster Assumption vs. Maximum Margin • Maximum margin classifier (e.g. SVM) wx+b • Maximum margin  low density around decision boundary  Cluster assumption • Any thought about utilizing the unlabeled data in support vector machine?

  23. Transductive SVM • Decision boundary given a small number of labeled examples

  24. Transductive SVM • Decision boundary given a small number of labeled examples • How will the decision boundary change given both labeled and unlabeled examples?

  25. Transductive SVM • Decision boundary given a small number of labeled examples • Move the decision boundary to place with low local density

  26. Transductive SVM • Decision boundary given a small number of labeled examples • Move the decision boundary to place with low local density • Classification results • How to formulate this idea?

  27. Transductive SVM: Formulation • Labeled data L: • Unlabeled data D: • Maximum margin principle for mixture of labeled and unlabeled data • For each label assignment of unlabeled data, compute its maximum margin • Find the label assignment whose maximum margin is maximized

  28. Tranductive SVM Different label assignment for unlabeled data  different maximum margin

  29. A binary variables for label of each example Transductive SVM Original SVM Constraints for unlabeled data Transductive SVM: Formulation

  30. Computational Issue • No longer convex optimization problem. (why?) • How to optimize transductive SVM? • Alternating optimization

  31. Alternating Optimization • Step 1: fix yn+1,…, yn+m, learn weights w • Step 2: fix weights w, try to predict yn+1,…, yn+m (How?)

  32. Empirical Study with Transductive SVM • 10 categories from the Reuter collection • 3299 test documents • 1000 informative words selected using MI criterion

  33. Co-training for Semi-supervised Learning • Consider the task of classifying web pages into two categories: category for students and category for professors • Two aspects of web pages should be considered • Content of web pages • “I am currently the second year Ph.D. student …” • Hyperlinks • “My advisor is …” • “Students: …”

  34. Co-training for Semi-Supervised Learning

  35. Co-training for Semi-Supervised Learning It is easier to classify this web page using hyperlinks It is easy to classify the type of this web page based on its content

  36. Co-training • Two representation for each web page Content representation: (doctoral, student, computer, university…) Hyperlink representation: Inlinks: Prof. Cheng Oulinks: Prof. Cheng

  37. Co-training: Classification Scheme • Train a content-based classifier using labeled web pages • Apply the content-based classifier to classify unlabeled web pages • Label the web pages that have been confidently classified • Train a hyperlink based classifier using the web pages that are initially labeled and labeled by the classifier • Apply the hyperlink-based classifier to classify the unlabeled web pages • Label the web pages that have been confidently classified

  38. Co-training • Train a content-based classifier

  39. Co-training • Train a content-based classifier using labeled examples • Label the unlabeled examples that are confidently classified

  40. Co-training • Train a content-based classifier using labeled examples • Label the unlabeled examples that are confidently classified • Train a hyperlink-based classifier • Prof. : outlinks to students

  41. Co-training • Train a content-based classifier using labeled examples • Label the unlabeled examples that are confidently classified • Train a hyperlink-based classifier • Prof. : outlinks to students • Label the unlabeled examples that are confidently classified

  42. Co-training • Train a content-based classifier using labeled examples • Label the unlabeled examples that are confidently classified • Train a hyperlink-based classifier • Prof. : outlinks to • Label the unlabeled examples that are confidently classified

More Related