550 likes | 845 Views
Multi-Task Learning and Web Search Ranking. Gordon Sun ( 孙国政 ) Yahoo! Inc. March 200 7. Outline: Brief Review: Machine Learning in web search ranking and Multi-Task learning. MLR with Adaptive Target Value Transformation – each query is a task.
Multi-Task Learning and Web Search Ranking Gordon Sun (孙国政) Yahoo! Inc March 2007
Outline: • Brief Review: Machine Learning in web search ranking and Multi-Task learning. • MLR with Adaptive Target Value Transformation – each query is a task. • MLR for Multi-Languages – each language is a task. • MLR for Multi-query classes – each type of queries is a task. • Future work and Challenges.
MLR (Machine Learning Ranking) • General Function Estimation and Risk Minimization: • Input: x = {x1, x2, …, xn} • Output: y • Training set: {yi, xi}, i = 1, …, n • Goal: Estimate mapping function y = F(x) • In MLR work: • x = x (q, d) = {x1, x2, …, xn} --- ranking features • y = judgment labeling: e.g. {P E G F B} mapped to {0, 1, 2, 3, 4}. • Loss Function: L(y, F(x)) = (y – F(x))2 • Algorithm: MLR with regression.
Rank features construction • Query features: • query language, query word types (Latin, Kanji, …), … • Document features: • page_quality, page_spam, page_rank,… • Query-Document dependent features: • Text match scores in body, title, anchor text (TF/IDF, proximity), ... • Evaluation metric – DCG (Discounted Cumulative Gain) • where grades Gi = grade values for {P, E, G, F, B} (NDCG – 2n) DCG5 -- (n=5), DCG10 -- (n=10)
Milti-Task Learning • Single-Task Learning (STL) • One prediction task (classification/regression): • to estimate a function based on oneTraining/testing set: • T= {yi, xi}, i = 1, …, n • Multi-Task Learning (MTL) • Multiple prediction tasks, each with their own training/testing set: • Tk= {yki, xki}, k = 1, …, m, i = 1, …, nk • Goal is to solve multiple tasks together: • - Tasks share the same input space (or at least partially): • - Tasks are related (say, MLR -- share one mapping function)
Milti-Task Learning: Intuition and Benefits • EmpiricalIntuition • Data from “related” tasks could help -- • Equivalent to increase the effective sample size • Goal: Share data and knowledge from task to task --- Transfer Learning. • Benefits • - when # of training examples per task is limited • - when # of tasks is large and can not be handled by MLR for each task. • - when it is difficult/expensive to obtain examples for some tasks • - possible to obtain meta-level knowledge
Milti-Task Learning: “Relatedness” approaches. • Probabilistic modeling for task generation • [Baxter ’00], [Heskes ’00], [The, Seeger, Jordan ’05], • [Zhang, Gharamani, Yang ’05] • • Latent Variable correlations • – Noise correlations [Greene ’02] • – Latent variable modeling [Zhang ’06] • • Hidden common data structure and latent variables. • – Implicit structure (common kernels) [Evgeniou, • Micchelli, Pontil ’05] • – Explicit structure (PCA) [Ando, Zhang ’04] • • Transformation relatedness [Shai ’05]
Milti-Task Learning for MLR • Different levels of relatedness. • Grouping data based on queries, each query could be one task. • Grouping data based on languages of queries, each language is a task. • Grouping data based on query classes
Outline: • Brief Review: Machine Learning in web search ranking and Multi-Task learning. • MLR with Adaptive Target Value Transformation – each query is a task. • MLR for Multi-Languages – each language is a task. • MLR for Multi-query classes – each type of queries is a task. • Future work and Challenges.
Adaptive Target Value Transformation • Intuition: • Rank features vary a lot from query to query. • Rank features vary a lot from sample to sample with same labeling. • MLR is a ranking problem, but regression is to minimize prediction errors. • Solution: Adaptively adjust training target values: • Where linear (monotonic) transformation is required • (nonlinear g() may not reserve orders of E(y|x))
Adaptive Target Value Transformation • Implementation: Empirical Risk Minimization • Where the linear transformation weights are regularized, • λα and λβ are regularization parameters, the p-norm. • The solution will be
Adaptive Target Value Transformation • Norm p=2 solution: for each (λα and λβ ) • For initial (αβ) , find F(x) by solving: • For given F(x), solve for each (αq, βq), q = 1, 2, … Q. • Repeat 1 until • Norm p=1 solution, solve conditional quadratic programming [Lasso/lars] • Convergence Analysis: Assuming
Adaptive Target Value Transformation Experiments data:
Adaptive Target Value TransformationEvaluation of aTVT on US and CN data
Adaptive Target Value Transformation Observations: 1. Relevance gain (DCG5 ~ 2%) is visible. 2. Regularization is needed. 3. Different query types gain differently from aTVT.
Outline: • Brief Review: Machine Learning in web search ranking and Multi-Task learning. • MLR with Adaptive Target Value Transformation – each query is a task. • MLR for Multi-Languages – each language is a task. • MLR for Multi-query classes – each type of queries is a task. • Future work and Challenges.
Multi-Language MLR Objective: • Make MLR globally scalable: >100 languages, >50 regions. • Improve MLR for small regions/languages using data from other languages. • Build a Universal MLR for all regions that do not have data and editorial support.
Multi-Language MLR Part 1 • Feature Differences between Languages • MLR function differences between Languages.
Multi-Language MLRDistribution of Text Score Perf+Excellent urls Bad urls Legend: JP, CN, DE, UK, KR
Multi-Language MLRDistribution of Spam Score Perf+Excellent urls Bad urls JP, KR similar DE, UK similar Legend: JP, CN, DE, UK, KR
Multi-Language MLRTraining and Testing on Different Languages Train Language Test Language % DCG improvement over base function
Multi-Language MLRLanguage Differences: observations • Feature difference across languages is visible but not huge. • MLR trained for one language does not work well for other languages.
Multi-Language MLR Part 2 Transfer Learning with Region features
Multi-Language MLRQuery Region Feature • New feature: query region: • Multiple Binary Valued Features: • Feature vector: qr = (CN, JP, UK, DE, KR) • CN queries: (1, 0, 0, 0, 0) • JP queries: (0, 1, 0, 0, 0) • UK queries: (0, 0, 1, 0, 0) • … • To test the Trained Universal MLR on new languages: e.g. FR • Feature vector: qr = (0, 0, 0, 0, 0)
Multi-Language MLRQuery Region Feature: Experiment results % DCG-5 improvement over base function
Multi-Language MLRQuery Region Feature: Experiment resultsCJK and UK,DE Models All models include query region feature
Multi-Language MLRQuery Region Feature: Observations • Query Region feature seems to improve combined model performance in every case. Not always statistically significant. • Helped more when we had less data (KR). • Helped more when introducing “near languages” models (CJK, EU) • Would not help for languages with large training data (JP, CN).
Multi-Language MLRExperiments: Overweighting Target Language • This method deals with the common case where there is a language with a small amount of data available. • Use all available data, but change the weight of the data from the target language. • When weight=1 “Universal Language Model” • As weight->INF becomes Single Language Model.
Multi-Language MLROverweighting Target LanguageObservations: • It helps on certain languages with small size of data (KR, DE). • It does not help on some languages (CN, JP). • For languages with enough data, it will not help. • The weighting of 10 seems better than 1 and 100 on average.
Multi-Language MLR Part 3 Transfer Learning with Language Neutral Data and Regression Diff
Multi-Language MLRSelection of Language Neutral queries: • For each of (CN, JP, KR, DE, UK), train an MLR with own data. • Test queries of one language by all languages MLRs. • Select queries that showed best DCG cross different language MLRs. • Consider these queries as language neutral and could be shared by all language MLR development.
Multi-Language MLR Evaluation of Language Neutral Queries on CN-simplified dataset (2,753 queries).
Outline: • Brief Review: Machine Learning in web search ranking and Multi-Task learning. • MLR with Adaptive Target Value Transformation – each query is a task. • MLR for Multi-Languages – each language is a task. • MLR for Multi-query classes – each type of queries is a task. • Future work and Challenges.
Multi-Query Class MLR Intuitions: • Different types of queries behave differently: • Require different ranking features, (Time sensitive queries page_time_stamps). • Expect different results: (Navigational queries one official page on the top.) • Also, different types of queries could share the same ranking features. • . • Multi-class learning could be done in a unified MLR by • Introducing query classification and use query class as input ranking features. • Adding page level features for the corresponding classes.
Multi-Query Class MLR Time Recency experiments: • Feature implementation: • Binary query feature: Time Sensitive (0,1) • Binary page feature: discovered within last three month. • Data: • 300 time sensitive queries (editorial). • ~2000 ordinary queries. • Over weight time sensitive queries by 3. • 10-fold cross validation on MLR training/testing.
Multi-Query Class MLR Time Recency experiments result: Compare MLR with and w/o page_time feature.
Multi-Query Class MLR Name Entity queries: • Feature implementation: • Binary query feature: name entity query (0,1) • 11 new page features implemented: Path length • Host length • Number of host component (url depth) • Path contains “index” • Path contains either “cgi”, “asp”, “jsp”, or “php” • Path contains “search” or “srch”, … • Data: • 142 place name entity queries. • ~2000 ordinary queries. • 10-fold cross validation on MLR training/testing.
Multi-Query Class MLR Name Entity query experiments result: Compared MLR with base model without name entity features.
Multi-Query Class MLR Observations: • Query class combined with page level features could help MLR relevance. • More research is needed on query classification and page level feature optimization.
Outline: • Brief Review: Machine Learning in web search ranking and Multi-Task learning. • MLR with Adaptive Target Value Transformation – each query is a task. • MLR for Multi-Languages – each language is a task. • MLR for Multi-query classes – each type of queries is a task. • Future work and Challenges.
Future Work and Challenges • Multi-task learning extended to different types of training data: • Editorial judgment data. • User click-through data • Multi-task learning extended to different types of relevance judgments: • Absolute relevance judgment. • Relative relevance judgment • Multi-task learning extended to use both • Labeled data. • Unlabeled data. • Multi-task learning extended to different types of search user intentions.