270 likes | 482 Views
Large-scale Recommendations in a Dynamic Marketplace. Jay Katukuri Rajyashree Mukherjee Tolga Konik Chu-Cheng Hsieh. Meet John Doe. John is interested in an item: “iPhone 5 64gb white”, should we recommends “iPhone 5 case” (or) “iPhone 5s gold”. Recommendation on e-marketplace.
E N D
Large-scale Recommendations in a Dynamic Marketplace Jay Katukuri Rajyashree Mukherjee TolgaKonik Chu-Cheng Hsieh LSRS 2013
Meet John Doe John is interested in an item: “iPhone 5 64gb white”, should we recommends • “iPhone 5 case”(or) • “iPhone 5s gold” LSRS 2013
Recommendation on e-marketplace • Recommendation “before” purchase • iPhone 5S gold • Recommendation “after” purchase • iPhone 5 case Similar Item Recommendation (SIR) Related Item Recommendation (RIR) LSRS 2013
SIR- Example 1 LSRS 2013
SIR Example 2 LSRS 2013
Related Item Recommendation Recommendations for Xbox 360 4GB on Checkout page LSRS 2013
Main Idea • Similar Item Clustering (SIC) • Titles • Attributes (Price, etc.) • Images • Recommendation • SIR: (same cluster) • RIR: (neighbor clusters) LSRS 2013
Models • Item clustersCluster represented by meaningful keywords • “clarkswomen shoe pumps classics” • “authentic handmade amish quilt” • Cluster-Cluster Relations • “samsung galaxy s4” – “samsung galaxy s4 screen protector” • “wolfgang puck electric pressure cooker” – “kitchenaid food processor” LSRS 2013
System Architecture - Overview Offline Model Generation The Data Store Real-time Performance System Clusters Bought Item Lost Item Similar Items Recommender (SIR) Clusters Model Generation ?relatedTo(item) ?similarTo(item) Inventory Related Items Similar Items Clickstream Transactions Related Items Recommender (RIR) Related Clusters Model Generation Conceptual Knowledgebase Cluster-Cluster Relations LSRS 2013
Cluster Generation(offline) LSRS 2013
Data on eBay • Item-item co-occurrences on transaction logs • Large Data • Much bigger data set in both users and inventory than other ecommerce sites. • Scale • More than 300M listings. • More than 10M new items every day LSRS 2013
Challenges • Global clustering not feasible • Size bias on different categories • Performance LSRS 2013
Model Generation - Clusters • Select a few keyword to represents “big notions”, e.g. iPhone, Handbags, etc. • How to select? • Clustering by K-means • How to set K? LSRS 2013
Model Generation - Clusters • Problem:Global clustering not feasible • Solution:Partition input data by user queries • Parallel distributed K-Means in Hadoop MapReduce • Dedupe and merge overlapping clusters(100X reduction in size over inventory with over 90% coverage) Inventory Conceptual Knowledgebase Data Store Clickstream Clusters concepts, categories user queries new clusters items Query-Recall Generation Cluster Generation query-to-items Clusters Model Generation LSRS 2013
Base Cluster Generation • Base Cluster ≡ Query • Find merge candidates based on query term overlap • Eg: “nikeairmax tennis shoes” -> “nikeairmax” • Score candidates using cosine similarity • Term weight : TF-IDF in the query space(document=query) • TF : Query Demand • IDF : Number of Queries LSRS 2013
Step 1: base cluster candidates • Method for choosing the ``base clusters’’ (initial states): • Minimum frequency • Supply threshold (Enough Inventory) • Min and max token constraint (Length of queries) • Heuristic constraints • Queries that have only numbers are not allowed: “10 5” • … • Merge similar clusters into one LSRS 2013
candidates merge • 4.34M base clusters merged into 1.95M • Example phrase(hand,made) phrase(king,s) queen quilt phrase(hand,made) phrase(pink,s) quilt phrase(hand,made) phrase(prae,owned) queen quilt phrase(hand,made) queen quilt phrase(hand,made) phrase(prae,owned) quilt phrase(hand,made) quilt size twin phrase(hand,made) quilt silk phrase(hand,made) quilt twin phrase(hand,made) phrase(patch,work) quilt phrase(hand,made) quilt white phrase(hand,made) phrase(king,size) quilt phrase(hand,made) phrase(yo,yo,s) quilt phrase(hand,made) quilt sale phrase(hand,made) quilt red phrase(hand,made) quilt LSRS 2013
Step 2: K-Means Clustering Query to Items Data Transaction Logs Base Cluster Generation Inventory Logs Generate Item Features Scoring Models K-Means Clustering of Base Clusters Split Clusters LSRS 2013
Clusters on Item Signature Cluster apple ipod touch 4g clear film protector screen clarks women shoe pumps classics LSRS 2013
Recommendation (online) LSRS 2013
Performance System Data Store Data Store Cluster-Cluster Relations Clusters Conceptual Knowledgebase Conceptual Knowledgebase Clusters Inventory Inventory Item Search Item Search SIR query formation Item Selection Item Selection related clusters clusters Cluster Assignment Cluster Assignment query RIR Query Formation queries items items SIR Ranking RIR Ranking ?similarTo(item) recommendations ?relatedTo(item) recommendations Lost Item Similar Items Bought Item Related Items LSRS 2013
Items in the same cluster LSRS 2013
Similar Item Recommendations LSRS 2013
Experimental Results • A/B Tests comparing against legacy systems • SIR legacy system • Completely online • Naïve approach of using seed item title as a search query • RIR legacy system • Chen, Y. and J.F. Canny, Recommending ephemeral items at web scale, ACM SIGIR 2011 • Collaborative Filtering on stable representations of items • Significant improvements at 90% confidence interval • SIR resulted in 38.18%higher user engagement (CTR) • RIR resulted in 10.5% higher CTR • Statistically significant improvement in site-wide business metrics from both SIR & RIR LSRS 2013
Conclusion • Balance between similarity and quality crucial in driving user engagement and conversion • Clusters of similar items in the inventory • Local clustering in the coverage set of user queries • Offline models built using Map-Reduce • Huge input datasets including inventory, clickstream and transactional data • Efficient real-time performance system • Currently deployed on ebay.com LSRS 2013
Acknowledgments • Current & Past team members • Kranthi Chalasani • Santanu Kolay • Riyaaz Shaik • Venkat Sundaranatha LSRS 2013
Chu-Cheng Hsieh chsieh@ebay.com We’re hiring LSRS 2013