1 / 36

Heterogeneous Cross Domain Ranking in Latent Space

Heterogeneous Cross Domain Ranking in Latent Space. Bo Wang Joint work with Jie Tang, Wei Fan and Songcan Chen. Framework of Learning to Rank. Example: Academic Network. Ranking over Web 2.0. Traditional Web: standard (long) documents

afram
Download Presentation

Heterogeneous Cross Domain Ranking in Latent Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heterogeneous Cross Domain Ranking in Latent Space Bo Wang Joint work with Jie Tang, Wei Fan and Songcan Chen

  2. Framework of Learning to Rank

  3. Example: Academic Network

  4. Ranking over Web 2.0 • Traditional Web: standard (long) documents • relevance measures such as BM25 and PageRank score may play a key role • Web 2.0: shorter non-standard documents • users' click-through data and users' comments might be much more important

  5. Heterogeneous transfer ranking • If there isn't sufficient supervision on the domain of interest, how could one borrow labeled information from a related but heterogeneous domain to build an accurate model? • Differences from transfer learning • What to transfer • Instance type • What we care • Feature extraction

  6. Main Challenges • How to formalize the problem in a unified framework? As both feature distributions and objects' types in the source domain and the target domain may be different. • How to transfer the knowledge of heterogeneous objects across domains? • How to preserve the preference relationships between instances across heterogeneous data sources?

  7. Outline • Motivation • Problem Formulation • Transfer Ranking • Basic Idea • The proposed algorithm • Generalization bound • Experiment • Ranking on Homogeneous data • Ranking on Heterogeneous data • Conclusion

  8. Problem Formulation • Source domain: • Instance space: • Rank level set: where • Target domain: and • The two domains are heterogeneous but related • Problem Definition: given and , the goal is to learn a ranking function for predicting the rank levels of test set

  9. Outline • Motivation • Problem Formulation • Transfer Ranking • Basic Idea • The proposed algorithm • Generalization bound • Experiment • Ranking on Homogeneous data • Ranking on Heterogeneous data • Conclusion

  10. Basic Idea • Because the feature distributions or even objects' types may be different across domains, we resort to finding a common latent space in which the preference relationships in source and target domains are all preserved • We can directly use a ranking loss function to evaluate how well the preferences are preserved in that latent space • Optimize the two ranking loss functions simultaneously in order to find the best latent space

  11. The Proposed Algorithm Given the labeled data in source domain We aim to learn a ranking function which satisfies: The ranking loss function can be defined as: The latent space can be described by The Framework:

  12. Ranking SVM

  13. Generalization Bound

  14. Scalability Let d is the total number of different features in two domains, then matrix D is d*d and W is d*2, so it can be applied to very large scale data without too many features • Complexity Ranking SVM training has O((n1 + n2)3) time and O((n1 + n2)2) space complexity, in our algorithm Tr2SVM, T is the maximal iteration number, then Tr2SVM has O((2T +1)(n1 + n2)3) time and O((n1 + n2)2) space complexity for training

  15. Outline • Motivation • Problem Formulation • Transfer Ranking • Basic Idea • The proposed algorithm • Generalization bound • Experiment • Ranking on Homogeneous data • Ranking on Heterogeneous data • Conclusion

  16. Data Set • LETOR 2.0 • three sub datasets: TREC2003, TREC2004, and OHSUMED • query-document pairs collection • TREC data: a topic distillation task which aims to find good entry points principally devoted to a given topic • OHSUMED data: a collection of records from medical journals • LETOR_TR • three sub datasets: TREC2003_TR, TREC2004_TR, and OHSUMED_TR

  17. Data Set (Cont’d)

  18. Data Set (Cont’d)

  19. Experiment Setting • Baselines: • Measures: MAP (mean average precision) and NDCG (normalized discount cumulative gain) • Three transfer ranking tasks: • From S1 to T1 • From S2 to T2 • From S3 to T3

  20. Why effective? • Why transfer ranking is effective on LETOR_TR dataset? Because the features used in ranking already contain relevance information between queries and documents.

  21. Outline • Motivation • Problem Formulation • Transfer Ranking • Basic Idea • The proposed algorithm • Generalization bound • Experiment • Ranking on Homogeneous data • Ranking on Heterogeneous data • Conclusion

  22. Data Set • A subset of ArnetMiner: 14,134 authors, 10,716 papers, and 1,434 conferences. • 8 most frequent queries from log file: • 'information extraction', 'machine learning', 'semantic web', 'natural language processing', 'support vector machine', 'planning', 'intelligent agents' and 'ontology alignment' • Author collection: • For each query, we gathered authors from Libra, Rexa and ArnetMiner • Conference collection: • For each query, we gathered conferences from Libra and ArntetMiner • Evaluation • Onefaculty and two graduates to judge the relevance between query and authors/conferences

  23. Feature Definition • All the features are defined between queries and virtual documents • Conference • Use all the paper titles published on a conference to form a conference "document" • Author • Use all the paper titles authored by an expert as the expert's "document"

  24. Feature Definition (Cont’d)

  25. Experimental Results

  26. Why effective? • Why our approach can be effective on the heterogeneous network? Because of latent dependencies between the objects, some common features can still be extracted from the latent dependencies.

  27. Conclusion

  28. Conclusion (Cont’d) • We formally define transfer ranking problem and propose a general framework • We provide a preferred solution under the regularized framework by simultaneously minimize two ranking loss functions in two domains and derive the generalization bound • The experimental results on LETOR and a heterogeneous academic network verified the effectiveness of the proposed algorithm

  29. Future Work • Develop new algorithms under the framework • Reduce the time complexity for online usage • Negative transfer • Similarity between queries • Actively select similar queries

  30. Thanks! Your Question. Our Passion.

More Related