Heterogeneous Cross Domain Ranking in Latent Space

Heterogeneous Cross Domain Ranking in Latent Space Bo Wang Joint work with Jie Tang, Wei Fan and Songcan Chen

Framework of Learning to Rank

Example: Academic Network

Ranking over Web 2.0 • Traditional Web: standard (long) documents • relevance measures such as BM25 and PageRank score may play a key role • Web 2.0: shorter non-standard documents • users' click-through data and users' comments might be much more important

Heterogeneous transfer ranking • If there isn't sufficient supervision on the domain of interest, how could one borrow labeled information from a related but heterogeneous domain to build an accurate model? • Differences from transfer learning • What to transfer • Instance type • What we care • Feature extraction

Main Challenges • How to formalize the problem in a unified framework? As both feature distributions and objects' types in the source domain and the target domain may be different. • How to transfer the knowledge of heterogeneous objects across domains? • How to preserve the preference relationships between instances across heterogeneous data sources?

Outline • Motivation • Problem Formulation • Transfer Ranking • Basic Idea • The proposed algorithm • Generalization bound • Experiment • Ranking on Homogeneous data • Ranking on Heterogeneous data • Conclusion

Problem Formulation • Source domain: • Instance space: • Rank level set: where • Target domain: and • The two domains are heterogeneous but related • Problem Definition: given and , the goal is to learn a ranking function for predicting the rank levels of test set

Basic Idea • Because the feature distributions or even objects' types may be different across domains, we resort to finding a common latent space in which the preference relationships in source and target domains are all preserved • We can directly use a ranking loss function to evaluate how well the preferences are preserved in that latent space • Optimize the two ranking loss functions simultaneously in order to find the best latent space

The Proposed Algorithm Given the labeled data in source domain We aim to learn a ranking function which satisfies: The ranking loss function can be defined as: The latent space can be described by The Framework:

Ranking SVM

Generalization Bound

Scalability Let d is the total number of different features in two domains, then matrix D is d*d and W is d*2, so it can be applied to very large scale data without too many features • Complexity Ranking SVM training has O((n1 + n2)3) time and O((n1 + n2)2) space complexity, in our algorithm Tr2SVM, T is the maximal iteration number, then Tr2SVM has O((2T +1)(n1 + n2)3) time and O((n1 + n2)2) space complexity for training

Data Set • LETOR 2.0 • three sub datasets: TREC2003, TREC2004, and OHSUMED • query-document pairs collection • TREC data: a topic distillation task which aims to find good entry points principally devoted to a given topic • OHSUMED data: a collection of records from medical journals • LETOR_TR • three sub datasets: TREC2003_TR, TREC2004_TR, and OHSUMED_TR

Data Set (Cont’d)

Experiment Setting • Baselines: • Measures: MAP (mean average precision) and NDCG (normalized discount cumulative gain) • Three transfer ranking tasks: • From S1 to T1 • From S2 to T2 • From S3 to T3

Why effective? • Why transfer ranking is effective on LETOR_TR dataset? Because the features used in ranking already contain relevance information between queries and documents.

Data Set • A subset of ArnetMiner: 14,134 authors, 10,716 papers, and 1,434 conferences. • 8 most frequent queries from log file: • 'information extraction', 'machine learning', 'semantic web', 'natural language processing', 'support vector machine', 'planning', 'intelligent agents' and 'ontology alignment' • Author collection: • For each query, we gathered authors from Libra, Rexa and ArnetMiner • Conference collection: • For each query, we gathered conferences from Libra and ArntetMiner • Evaluation • Onefaculty and two graduates to judge the relevance between query and authors/conferences

Feature Definition • All the features are defined between queries and virtual documents • Conference • Use all the paper titles published on a conference to form a conference "document" • Author • Use all the paper titles authored by an expert as the expert's "document"

Feature Definition (Cont’d)

Experimental Results

Why effective? • Why our approach can be effective on the heterogeneous network? Because of latent dependencies between the objects, some common features can still be extracted from the latent dependencies.

Conclusion

Conclusion (Cont’d) • We formally define transfer ranking problem and propose a general framework • We provide a preferred solution under the regularized framework by simultaneously minimize two ranking loss functions in two domains and derive the generalization bound • The experimental results on LETOR and a heterogeneous academic network verified the effectiveness of the proposed algorithm

Future Work • Develop new algorithms under the framework • Reduce the time complexity for online usage • Negative transfer • Similarity between queries • Actively select similar queries

Thanks! Your Question. Our Passion.

Heterogeneous Cross Domain Ranking in Latent Space

Heterogeneous Cross Domain Ranking in Latent Space

Presentation Transcript

Cross-domain Collaboration Recommendation

Cross-domain Collaboration Recommendation

Cross-Domain Secure Computation

Cross Domain Patient Identity Management

Cross Domain Patient Identity Management

Relevance Ranking in the Scholarly Domain

Heterogeneous Domain Adapation using Manifold Alignment

C ontingency Ranking by Time Domain Simulations

Cross Domain Review Laboratory

Cross Domain Review PCC

Cross-domain concepts

Managing a Space of Heterogeneous Data

Cross Domain Review Eye Care

Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Modeling in the Time Domain - State-Space

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Latent Semantic Indexing (mapping onto a smaller space of latent concepts)

DOMAIN AUTHORITY INFLUENCES ON GOOGLE RANKING

LOGAN: Unpaired Shape Transform in Latent O vercomplete Space

Cross Domain Review Cardiology

Content and QoS Policies in Multi-domain Heterogeneous Mobile Systems

Heterogeneous Cross Domain Ranking in Latent Space