280 likes | 413 Views
Fast Top-k Retrieval for Model Based Recommendation. Deepak Agarwal (Yahoo! Research) Maxim Gurevich (Google) Presented by Guang LING. Outline. Motivation Problem definition The approach Binary classification L2-regression of scores Experiments Conclusion. Motivation.
E N D
Fast Top-k Retrieval for Model Based Recommendation Deepak Agarwal (Yahoo! Research) Maxim Gurevich (Google) Presented by Guang LING
Outline • Motivation • Problem definition • The approach • Binary classification • L2-regression of scores • Experiments • Conclusion
Motivation • Suppose that we • Are a search engine company (Google, say) • Want to display ads given a query • Have ML score for each ads given a query • Given a query • How to select the top-k ads to display • In a very short amount of time
Motivation Request User profile Pages News Ads • Challenges: • Many users/requests • Many content items • Strict latency constraints • Increasingly complex matching logic
Traditional IR solutions • Exploit content overlap matching function(tf-idf/cosine similarity) • Queries and documents “live” in the same high-dimensional space • Allows effectively reducing query result space • Highly optimized inverted index architecture • Joins inverted lists of query terms • Returns shortlist of result candidates • Few candidates undergo complex re-ranking
Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”
Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”
Index based pre-filtering Expensive (query, ad1) (ad1,score’1) (ad1,score1) ML model Top-2 (ada,scorea) (query, ad2) (ad2,score2) (ad2,score’2) query … … … (adb,scoreb) (query, adn) (adn,scoren) (adK,score’K) Inverted index ML model Top-2 (ada,scorea) query (adx,score’x) (adx,scorex) (ady,score’y)
Problem definition • Terminology: queries and documents • scr(d,q) – the (black-box) ML score of d on q • Goal: given q, find k items from D with highest scr(d,q) • Reduce to an inverted index query • Leverage extensive work on efficient inverted indexing • Challenges • How to construct the index • How to query it
Prior work • Learning to rank • A different problem:second-stage reranking of few documents retrieved by the first stage • We are building the first stage given the second stage • S. Goel, J. Langford, and A. Strehl. Predictive indexing for fast search [NIPS08] • A heuristic for building the index given an ML function and a query log • Fast and simple index building and retrieval • Not the standard dot product scoring • Does not support the standard docId sorted indices - harder to integrate into existing systems • Lower accuracy
The approach • Let ascr(d,q) be from the class of functions amenable to indexing: vector dot product • q = q is the original (sparse) query vector • d is not directly known • For each document: find d such that q’d scr(d,q) • Index d-s • Given q, query the index and retrieve top-K candidates according to ascr • Compute the true ML scores of candidates and return the top-k
Constructing the index: an optimization problem • Objective: find D={d1, d2,…, dn} minimizing score loss on a representative query load Q • Sparsification • d-s are high dimensional • Dense d-s will result in prohibitive index size • Add index size constraint:
Relaxing the problem • Do not know how to optimize directly • Relax the L0 index size constraint to L1 • Relax the objective function • Binary classification of being in top-k • L2 regression of ML scores
Binary classification • For each document d • Learn vector d that predicts whether d is among top-k on qQ • Predict by simple thresholding operator q’d > • Let y(q,d) be an indicator (-1,1) of whether d is among top-k on q • Efficiently solvable [Liblinear]
L2 regression of scores • For all pairs (q,d): minimize the discrepancy between true and approximate scores • Again, decomposable by documents • Efficiently solvable by a coordinate descent algorithm
Practical issues • Vectors d contain negative values • Less efficient retrieval • Independent solution for each document • Easy to parallelize • Easy to add new documents
Experiments • Experiment setup • Synthetic model – simple • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 3 terms generated from power-law distribution • Final score are summed score of individual scores for each term
Experiments • Experiment setup • Synthetic model – complex • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 5 terms generated from power-law distribution • In addition, each pair and triplet of terms are associated with a random permutation of documents and induced scores • Final score are summed score of individual scores for each term, pair and triplet
Experiments • Experiment setup • CTR model • Computational advertising dataset • Logistic regression model • 50K documents (ads), sampled 50K queries • Trained on a day’s live traffic
Experiments • Datasets • Two synthetic models: simple and complex • |D|=10K, |Q|=10K, 2K test queries • CTR model • |D|=50K, |Q|=50K, 50K test queries from a following day • Baselines • Random: k random documents • Static: fixed set of k documents with highest average scores • Predictive: Predictive indexing [Goel et al.]
Evaluation metrics • Recall: exact retrieval of true top-k • Overly conservative • Score loss: average loss in the score of retrieved docs • Captures application specific utility, e.g., CTR
Retrieval latency: CTR model • Disclaimer: prototype implementation • Brute-force (scoring all 50K ads): 4s per impression • Scoring top-100 candidates: 9ms • Top-100 retrieval • Baselines: ~0 (negligible) • Our approach: ~15ms
Index construction • ~1min per document (prototype implementation) • Trivially parallelizable • Easy to add new documents
Conclusions • A practical method for indexing black-box ML models • Integrates with existing indexing systems • Scales well to large itemsets • Tunable space-speed-accuracy tradeoff