1 / 29

Clustering networked data based on link and similarity in Active learning

Clustering networked data based on link and similarity in Active learning. Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang. Outline. Introduction Active Learning Networked data Related Work Newman’s Modularity Collective Classification(ICA) ALFNET

tanika
Download Presentation

Clustering networked data based on link and similarity in Active learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering networked data based on link and similarity in Active learning Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang

  2. Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion

  3. Passive Learning : Unlabeled instance : Labeled instance + Training data - - + Train + Classifier - + + - - Wrong : 5 + + - + + Classify + - - + + + - - + - Testing data + - + - - +

  4. Active Learning : Unlabeled node : Labeled node + Training data - + Train + Classifier - - + - Query Wrong : 2 + + - + + + Classify + - - + + + + - - + - - Testing data + + - - - EX : Query batch number = 3

  5. Network data + - + + - - + - - : Unlabeled node training : Labeled node classify Classifier

  6. Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion

  7. Newman’s Modularity for clustering 4 4 • m = 5 • : Real edge • : Degree of node • : Group of node •  = (1 – 2*2 /10 ) •  = (0 – 2*2/10 ) •  = (1 – 2*3/10 ) •  = (0 – 2*1/10 ) 1 1 5 5 2 2 3 3

  8. Newman’s Modularity for clustering • Example : •  = (1 – 5*2 /16 ) = 0.375 •  = (0 – 5*3/ 16 ) = -0.9375 •  = (1 – 2*5/ 16 ) = 0.375 •  = (1 – 2*3/ 16 ) = 0.625 •  = (0 – 3*5/ 16 ) = -0.9375 •  = (1 – 3*2/ 16) = 0.625 1 2 3 0.625+0.625 > 0.375+0.375

  9. Newman’s Modularity for clustering      Maximizing 1 1 -1 0.3 0.1 -0.5

  10. Collective Classification(ICA) • Iterative Classification Algorithm(ICA) feature Neighbor feature ? + CO 1 0 0 1 0 … 1 CC 1 0 0 1 0 … 1 3/5 2/5 .. ? - ? ? training Content-Only learner ? - ? ? + Compute neighbor feature using CO Iteration 1 training Compute neighbor feature using CC Iteration 2 Collective learner Iteration 3 . . . Until stable or threshold of iteration have elapsed

  11. CC problem • How to set threshold? : Labeled node + 3 : Unlabeled node - - + + - 2 - - - + + - + Infer neighbor feature : + - - 1 + - + + - - Iteration 1: 2/5 3/5 1 + 3/5 2/5 2 + - 0/1 1/1 3/5 2/5 3 Iteration 4: 1 2/5 3/5 2 3/5 2/5 Iteration 2: 1 1/1 0/1 3 2/5 3/5 2 1/1 0/1 3 2/5 3/5 Iteration 5: 1 4/5 1/5 2/5 3/5 2 Iteration 3: 1 0/1 1/1 4/5 1/5 3 2 0/1 1/1 3

  12. ALFNET • 1. Cluster data at least k clusters. • 2. Pick k clusters based on size and initialize Content-Only(CO)classifier … … … cluster cluster cluster k CO ClassifierSVM

  13. ALFNET • 3.while (labeled nodes < budget ) • 3.1 Re-train CO and CC classifier • 3.2 pick k cluster based on score : CO Training set train CC … … … cluster cluster cluster k

  14. 3.2 pick an item form each cluster based on CO Training set train CC

  15. ALFNET CO CC Main Label predict predicted category Class A proportion of three classifier predicted Class B CO Class C CC Class D Main entropy(1/3) + entropy(1/3) + entropy(1/3) = 0.3662 *3 entropy(2/3) + entropy(1/3) = 0.2703 + 0.3662 entropy(3/3) = 0

  16. Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion

  17. Modularity and Similarity EX: Node 1 1 1 0 0 Node 3 1 1 0 0 Node 2 1 0 0 0 Node 4 0 0 1 1   

  18. Maximum Q       Maximizing

  19. CLAL : Labeled node : Unlabeled node training CO training CO Query & classify Query & classify Until Labeled node > budget

  20. Tuning and greedy mechanism : Labeled node ? : Unlabeled node ? ? Moving priority: OutLink - Inlink 3 -> 2 -> 1 -> 1 ? ? ? CO training ? ? CO Query & classify Retrain & ? ? Query & classify reserve the greater COs ? ? Move Out-link > In-link ? ? ? CO CO Move Out-link > In-link Clustering priority : Low accuracy -> High accuracy

  21. Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion

  22. Background • Networked data • Social network • Citation network Person name friend node Person name feature feature feature … feature feature Attribute … feature Paper NO. cite word node Paper NO. word word … word Attribute word … word

  23. Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • ExperimentalResults • Conclusion

  24. Appendix

  25. SVM • Training data sets : Hyper-plan + + + + + + + + + + - - + + - - - - - - Margin Margin - -

  26. Challenge • Query efficiency from discriminative feature 510 400 250 Paper name Sum of 2 class word word … word 250 220 100 Paper name Class 1 word word word … 260 180 150 Class 2 Paper name word word word …

  27. CC problem :How to set terminal condition? • Different iteration will obtain diverse result. : CO predicted label :true labeled : labeled CC classifier Infer neighbor feature Local feature Neighbor feature A B A A B F1,F2,… NF_A NF_B B B A B 0,1,0,… 3/5 2/5 A A A A 2/3 1/3 Iteration 1 1/3 2/3 Iteration2 B B 4/5 1/5 A A A A 2/3 1/3 2/3 1/3

  28. ALFNET Query and training CO Compute Query and training classifier Compute N Iteration > ? N Y Labeled node >Budget? Y Output

  29. Representation and Challenge • In a citation network node node node node node How to use link information

More Related