530 likes | 636 Views
An Interactive Learning Approach to Optimizing Information Retrieval Systems. CMU ML Lunch September 27 th , 2010 Yisong Yue Carnegie Mellon University. Information Systems. Interactive Learning Setting. Find the best ranking function (out of 1000) Show results, evaluate using click logs
E N D
An Interactive Learning Approach to Optimizing Information Retrieval Systems CMU ML Lunch September 27th, 2010 Yisong Yue Carnegie Mellon University
Information Systems
Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem
Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?
Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?
Team-Game Interleaving (u=thorsten, q=“svm”) f1(u,q) r1 f2(u,q) r2 NEXTPICK 1. Kernel Machines http://svm.first.gmd.de/ 2. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ 3. Support Vector Machine and Kernel ... References http://svm.research.bell-labs.com/SVMrefs.html 4. Lucent Technologies: SVM demo applet http://svm.research.bell-labs.com/SVT/SVMsvt.html 5. Royal Holloway Support Vector Machine http://svm.dcs.rhbnc.ac.uk 1. Kernel Machines http://svm.first.gmd.de/ 2. Support Vector Machinehttp://jbolivar.freeservers.com/ 3. An Introduction to Support Vector Machineshttp://www.support-vector.net/ 4. Archives of SUPPORT-VECTOR-MACHINES ...http://www.jiscmail.ac.uk/lists/SUPPORT... 5. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ Interleaving(r1,r2) 1. Kernel Machines T2http://svm.first.gmd.de/ 2. Support Vector Machine T1http://jbolivar.freeservers.com/ 3. SVM-Light Support Vector Machine T2http://ais.gmd.de/~thorsten/svm light/ 4. An Introduction to Support Vector Machines T1http://www.support-vector.net/ 5. Support Vector Machine and Kernel ... ReferencesT2 http://svm.research.bell-labs.com/SVMrefs.html 6. Archives of SUPPORT-VECTOR-MACHINES ... T1http://www.jiscmail.ac.uk/lists/SUPPORT... 7. Lucent Technologies: SVM demo applet T2http://svm.research.bell-labs.com/SVT/SVMsvt.html Invariant: For all k, in expectation same number of team members in top k from each team. • Interpretation: (r1 Âr2) ↔ clicks(T1) > clicks(T2) [Radlinski, Kurup, Joachims; CIKM 2008]
Setting • Find the best ranking function (out of 1000) • Evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?
Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions • Cost function (regret): • (bt, bt’) are the two bandits chosen • b* is the overall best one • (% users who prefer best bandit over chosen ones) [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Example 1 • P(f* > f) = 0.9 • P(f* > f’) = 0.8 • Incurred Regret = 0.7 • Example 2 • P(f* > f) = 0.7 • P(f* > f’) = 0.6 • Incurred Regret = 0.3 • Example 3 • P(f* > f) = 0.51 • P(f* > f) = 0.55 • Incurred Regret = 0.06
Assumptions • P(bi > bj) = ½ + εij (distinguishability) • Strong Stochastic Transitivity • For three bandits bi > bj > bk : • Monotonicity property • Stochastic Triangle Inequality • For three bandits bi > bj > bk : • Diminishing returns property • Satisfied by many standard models • E.g., Logistic / Bradley-Terry
Explore then Exploit • First explore • Try to gather as much information as possible • Accumulates regret based on which bandits we decide to compare • Then exploit • We have a (good) guess as to which bandit best • Repeatedly compare that bandit with itself • (i.e., interleave that ranking with itself)
Naïve Approach • In deterministic case, O(K) comparisons to find max • Extend to noisy case: • Repeatedly compare until confident one is better • Problem: comparing two awful (but similar) bandits • Example: • P(A > B) = 0.85 • P(A > C) = 0.85 • P(B > C) = 0.51 • Comparing B and C requires many comparisons!
Interleaved Filter • Choose candidate bandit at random ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ • Repeat process with new candidate • (Remove all empirically worse bandits) ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ • Repeat process with new candidate • (Remove all empirically worse bandits) • Continue until 1 candidate left ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Intuition • Simulate comparing candidate with all remaining bandits simultaneously [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Intuition • Simulate comparing candidate with all remaining bandits simultaneously • Example: • P(A > B) = 0.85 • P(A > C) = 0.85 • P(B > C) = 0.51 • B is candidate • # comparisons between B vs C bounded by comparisons between B vs A! [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis • Can model sequence of candidate bandits as a random walk. • Which will be the next candidate bandit? • O(Log K) rounds 1/3 1/3 1/3 [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis • After each round, we remove a constant fraction of the remaining bandits. • O(K) total matches 4 matches 2 matches 0 matches [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Regret Analysis • T – time horizon • K – # bandits / retrieval functions • ε – best vs 2nd best • Average regret RT / T → 0 • Information-theoretically optimal • Also need to prove correctness (see paper) [Yue, Broder, Kleinberg, Joachims, COLT 2009]
Summary • Provably efficient online algorithm • (In a regret sense) • Also results for continuous (convex) setting
Summary • Provably efficient online algorithm • (In a regret sense) • Also results for continuous (convex) setting • Requires comparison oracle • Reflects user preferences • Independence / Unbiased • Strong transitivity • Triangle Inequality
Directions to Explore • Relaxing assumptions • E.g., strong transitivity & triangle inequality • Integrating context • Dealing with large K • Assume additional structure on for retrieval functions? • Other cost models • PAC setting (fixed budget, find the best possible) • Dynamic or changing user interests / environment
Improving Comparison Oracle • Dueling Bandits Problem • Interactive learning framework • Provably minimizes regret • Assumes idealized comparison oracle • Can we improve the comparison oracle? • Can we improve how we interpret results?
Determining Statistical Significance • Each q, interleave A(q) and B(q), log clicks • t-Test • For each q, score: % clicks on A(q) • E.g., 3/4 = 0.75 • Sample mean score (e.g., 0.6) • Compute confidence (p value) • E.g., want p = 0.05 (i.e., 95% confidence) • More data, more confident
Limitation • Example: query session with 2 clicks • One click at rank 1 (from A) • Later click at rank 4 (from B) • Normally would count this query session as a tie
Limitation • Example: query session with 2 clicks • One click at rank 1 (from A) • Later click at rank 4 (from B) • Normally would count this query session as a tie • But second click is probably more informative… • …so B should get more credit for this query
Linear Model • Feature vector φ(q,c): • Weight of click is wTφ(q,c) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example • wTφ(q,c) differentiates last clicks and other clicks [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example • wTφ(q,c) differentiates last clicks and other clicks • Interleave A vs B • 3 clicks per session • Last click 60% on result from A • Other 2 clicks random [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Example • wTφ(q,c) differentiates last clicks and other clicks • Interleave A vs B • 3 clicks per session • Last click 60% on result from A • Other 2 clicks random • Conventional w = (1,1) – has significant variance • Only count last click w = (1,0) – minimizes variance [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Scoring Query Sessions • Feature representation for query session: [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Scoring Query Sessions • Feature representation for query session: • Weighted score for query: • Positive score favors A, negative favors B [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Supervised Learning • Will optimize for z-Test: Inverse z-Test • Approximately equal t-Test for large samples • z-Score = mean / standard deviation (Assumes A > B) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]
Experiment Setup • Data collection • Pool of retrieval functions • Hash users into partitions • Run interleaving of different pairs in parallel • Collected on arXiv.org • 2 pools of retrieval functions • Training Pool: (6 pairs) know A > B • New Pool: (12 pairs)
Experimental Results • Inverse z-Test works well • Beats baseline on most of new interleaving pairs • Direction of tests all in agreement • In 6/12 pairs, for p=0.1, reduces sample size by 10% • In 4/12 pairs, achieves p=0.05, but not baseline • 400 to 650 queries per interleaving experiment • Weights hard to interpret (features correlated) • Largest weight: “1 if single click & rank > 1”
Interactive Learning • Dueling Bandits Problem • System learns “on-the-fly”. • Maximize total user utility over time • Exploration / exploitation tradeoff
Interactive Learning • Dueling Bandits Problem • System learns “on-the-fly”. • Maximize total user utility over time • Exploration / exploitation tradeoff • Interpreting Implicit Feedback • Supervised learning to learn better interpretation • How do we “close the loop”? • Simple yet practical model • Efficient & compatible with existing approaches
Thank You! Slides, papers & software available at www.yisongyue.com