260 likes | 375 Views
A general approximation framework for direct optimization of information retrieval measures. Tao Qin, Tie-Yan Liu, Hang Li Microsoft Research Asia, Beijing, China. Presenter: Shih-Hsiang Lin ( 林士翔 ). Reference:
E N D
A general approximation framework for directoptimization of information retrieval measures Tao Qin, Tie-Yan Liu, Hang Li Microsoft Research Asia, Beijing, China Presenter: Shih-Hsiang Lin (林士翔) • Reference: • Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD ’02 • Freund, Y., et al., (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969. • Burges, C., et al., (2005). Learning to rank using gradient descent. In ICML ’05 • Cao, Z., et al., (2007). Learning to rank: From pairwise approach to listwise approach. In ICML ’07 • Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07 • He, Y., et al., (2008). Are algorithms directly optimizing ir measures really direct? Technical Report MSR-TR-2008-154, Microsoft Corporation. • Xia, F., et al., (2008). Listwise approach to learning to rank: Theory and algorithm. In ICML ’08 • Xu, J., et al., (2008). Directly optimizing evaluation measures in learning to rank. In SIGIR ’08
INTRODUCTION • Recently direct optimization of information retrieval (IR) measures has become a new trend in learning to rank • IR measures are explicitly considered in the direct optimization approach • Generally, they can be grouped into two categories • introduce upper bounds of the IR measures • approximate the IR measures using some smooth functions • Open Problem • The relationships between the surrogate functions and the corresponding IR measures have not been sufficiently studies • Some of the proposed surrogate functions are not easy to optimize
INTRODUCTION • The main contributions of this work include • They set up a general framework for direct optimization • it is applicable to any position based IR measure • They take AP and NDCG as two examples to show how to optimize the position based IR measures as surrogate functions in the framework • They provide a theoretical justification to the direct optimization approach
REVIEW ON IR MEASURES (1/3) k denotes the truncation position rj equals one if the doc in the jth position is relevant and zero otherwise |D+ | denotes the number of relevant documents w.r.t. the query • Precision@k • Evaluating top k positions of a ranked list using two levels (relevant and irrelevant) of relevance judgment • Average Precision (AP) • e.g. relevant docs ranked at 1, 5, 10, precisions are 1/1, 2/5, 3/10, AP = (1/1+2/5+3/10)/3≈0.56 • MAP is defined as the mean of AP over a set of queries
REVIEW ON IR MEASURES (2/3) rank j : rj gain 2rj-1 discount 1/log2(1+j) DCG • Normalized Discounted Cumulated Gain (NDCG) • It is designed for multiple levels of relevance judgments • Uses graded relevance as a measure of the usefulness, or gain, from examining a document • Discounted Cumulative Gain (DCG) is the total gain accumulated at a particular rank k • e.g. 10 ranked documents judged on 0-3 relevance scale 3, 3, 2, 2, 1, 1, 1 7, 7, 3, 3, 1, 1, 1 1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33 7, 11.41, 12.91, 14.2, 14.59, 14.95, 15.28
REVIEW ON IR MEASURES (3/3) Nk is a constant depending on a Query to make the maximum value of NDCG@k of they query is 1 • NDCG is defined as
A GENERAL APPROXIMATION FRAMEWORK • The framework consists of four steps: • Reformulating an IR measure from ‘indexed by positions’ to ‘indexed by documents’ • Approximating the position function with a logistic function of ranking scores of documents • Approximating the truncation with a logistic function of positions of documents • Applying a global optimization technique to optimize the approximated measure (surrogate function)
STEP1: Measure Reformulation (1/2) X is a set of documents r(x) equals one for relevant document and zero otherwise π(x) denotes the position of x in the ranked list π 1{} is a truncation function • Most of the IR measures, for example, Precision@k, AP and NDCG are position based • The summations in the definitions of IR measures are taken over positions • The position of a document may change during the training process, which makes the optimization of the IR measures difficult • When indexed by documents, Precision@k can be re-written as below
STEP1: Measure Reformulation (2/2) With documents as indexes, AP can be re-written as Combining above two equations yields So far, this measurements are non-continuous and non-differentiable
STEP 2: Position Function Approximation (1/2) where α is a scaling constant and α>0 • The position function can be represented as a function of ranking scores • Due to the indication function in it, the position function is still non-continuous and non-differentiable • They propose approximating the indicator function using a logistic function
STEP 2: Position Function Approximation (2/2) • Examples of position approximation • The approximation is very accurate in this case
STEP3: Truncation Function Approximation β is a scaling constant and β >0 Some measures have truncation functions in definitions, such as Precision@k, AP, and NDCG@k. These measures need further approximations on the truncation functions To approximate the truncation function , a simple way is to use the logistic function once again Thus, we obtain the approximation of AP as follow
STEP4: Surrogate Function Optimization (1/3) • With the aforementioned approximation technique, the surrogate objective functions become continuous and differentiable with respect to the parameter in the ranking model • However, considering that the original IR measures contain a lot of local optima, the approximations of them will also contain local optima • One should better choose those global optimization methods such as random restart and simulated annealing in order to avoid being trapped to local optima
STEP4: Surrogate Function Optimization (2/3) • Gradient of ApproxAP where by chain rule
Comparisons with other directly optimizing techniques πiis the permutation selected for queryqi E(πi , yi ) is evaluation of πiw.r.t. yiforqi In general, we would like to create a ranking model that maximize the accuracy in terms of an IR measure on training data, or equivalently, minimizes the loss function defined as follows Directly optimizing techniques try to minimize the above function
Comparisons with other directly optimizing techniques (cont.) • From the viewpoint of loss function optimization, these methods fall into three categories • One can minimize upper bounds of the basic loss function defined on the IR measures AdaRank, SVMmap • One can approximate the IR measures with functions that are easy to handle this paper, SoftRank • One can use specially designed technologies for optimizing the non-smooth IR measures
Comparisons with other directly optimizing techniques (cont.) Since e-x≥ 1-x • Minimize upper bounds of the basic loss function • Type one bound • the logistic function • the exponential function
Comparisons with other directly optimizing techniques (cont.) [[.]] is one if the condition is satisfied, otherwise zero • Type two bound • The loss function measures the loss when the worst prediction is made
Comparisons with other directly optimizing techniques (cont.)
EXPERIMENTAL SETUP • Datasets • LETOR 3.0 datasets • a benchmark collection for the research on learning to rank for information retrieval • TD2003, TD2004 and OHSUMED • Retrieval method • Use linear ranking model for ApproxAP and ApproxNDCG in the experiments
EXPERIMENTAL RESULTS (1/3) Approximate error: • On the approximation of IR measures • The approximation accuracy is very high and it becomes more accurate as increasing α orβ
EXPERIMENTAL RESULTS (2/3) • On the performance of ApproxAP • Five fold cross validation as suggested in LETOR for both TD2003 and TD2004 datasets • α = {50, 100, 150, 200, 250, 300}, β= {1,10, 20, 50, 100} δ=0.001, η=0.01, K=10 • The result clearly shows the advantage of using the proposed method for direct optimization
EXPERIMENTAL RESULTS (3/3) • It also can be found that AdaRank.MAP and SVMmapare not as good as Ranking SVM and ListNet • AdaRank.MAP and SVMmapoptimize the upper bound of AP and it is not clear whether the bound is tight. • If the bound is very loose, optimization of the bound cannot always lead to the optimization of AP, and so they may not perform well on some datasets.
CONCLUSIONS AND FUTURE WORK (1/2) • In this paper, they have set up a general framework to approximate position based IR measures • The key part of the framework is to approximate the positions of documents by logistic functions of their scores • There are several advantages of this framework • The way of approximating position based measures is simple yet general • Many existing techniques can be directly applied to the optimization and the optimization process itself is measure independent • It is easy to conduct analysis on the accuracy of the approach and high approximation accuracy can be achieved by setting appropriate parameters
CONCLUSIONS AND FUTURE WORK (2/2) • There are still some issues that need to be further studied • The approximated measures are not convex, and there may be many local optima in training • Conduct experiments to test the algorithms with other function classes