280 likes | 508 Views
TransRank: A Novel Algorithm for Transfer of Rank Learning. Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine Learning Group, MSRA depin.chen@mail.ustc.edu.cn. Content. Ranking for IR Paper motivation The algorithm: TransRank
E N D
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine Learning Group, MSRA depin.chen@mail.ustc.edu.cn
Content • Ranking for IR • Paper motivation • The algorithm: TransRank • Results & future work
Ranking in IR • Ranking is crucial in information retrieval. It aims to move the good results up, while the bad down. • A well known example: web search engine
Learning to rank • Ranking + Machine learning = Learning to rank • An early work Ranking SVM, “Support Vector Learning for Ordinal Regression” , Herbrich et al [ICANN 99].
Existing approaches • Early ones Ranking SVM, RankBoost … • Recently IRSVM, AdaRank, ListNet ... • Tie-Yan Liu’s team at MSRA
Content • Learning to rank in IR • Paper motivation • The algorithm: TransRank • Results & future work
Training data shortage • Learning to rank relies on the full supply of labeled training data. • In real world practice …
Transfer learning • Transfer learning definition Transfer knowledge learned from different but related problems to solve current problem effectively, with fewer training data and less time [Yang, 2008]. • Learning to walk can help learn to run • learning to program with C++ can help learn to program with JAVA • … • We follow the spirit of transfer learning in this paper.
Content • Learning to rank in IR • Paper motivation • The algorithm: TransRank • Results & future work
Problem formulation • St: training data in target domain Ss: auxiliary training data from a source domain • Note that, • What we want? A ranking function for the target domain
TransRank • Three steps of TransRank:
Step 1: K-best query selection • Query’s ranking direction query 11 in OHSUMED query 41 in OHSUMED
The goal of step 1: We want to select the queries from source domain who have the most similar ranking directions with the target domain data. • These queries are treated to be most like the target domain training data.
Utility function (1) • Preprocess Ss: select k best queries, and discard the rest. • A “best” query is the query, whose ranking direction is confidently similar with that of queries in St. • The utility function combines two parts: confidence and similarity.
Utility function (2) • Confidence is valued using a separation value. The better different classes of instances are separated, the ranking direction will be more confident.
Utility function (3) • Cosine similarity.
Step 2: Feature augmentation • Daumé implemented cross-domain classification in NLP through a method called “feature augmentation” [ACL 07] . • For source-domain document vector (1, 2, 3) (1, 2, 3)(1, 2, 3, 1, 2, 3, 0, 0, 0) • For target-domain document vector (1, 2, 3) (1, 2, 3)(1, 2, 3, 0, 0, 0, 1, 2, 3)
Step 3: Ranking SVM • Ranking SVM is the state-of-the-art learning to rank algorithm, proposed by Herbrich et al [ICANN 99].
Content • Learning to rank in IR • Paper motivation • The heuristic algorithm: TransRank • Results & future work
Experimental settings • Datasets: OHSUMED (the LETOR version), WSJ, AP • Features: feature set defined in OHSUMED. Same features are abstracted on WSJ and AP • Evaluation measures: NDCG@n, MAP • For Ranking SVM, we use SVMlight by Joachims. • Two group of experiments WSJ OHSUMED AP OHSUMED
Compared algorithms • Baseline: run Ranking SVM on St • TransRank • Directly Mix: Step 1 + Step3
Performance comparison 40% of target labeled data, k=10 source domain: WSJ source domain: AP
Impact of target labeled data • From 5% to 100%, k=10 source domain: WSJ source domain: AP
Impact of k 40% of target labeled data
Future work • Web scale experiments, i.e. data from search engines • More integrated algorithm using machine learning techniques • Theoretical study for transfer of rank learning