20 likes | 233 Views
Linear Submodular Bandits and their Application to Diversified Retrieval. Yisong Yue (CMU) & Carlos Guestrin (CMU). Challenge 2: Personalization. Optimizing Recommender Systems. Every day, users come to news portal For each user,
E N D
Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Challenge 2: Personalization Optimizing Recommender Systems • Every day, users come to news portal • For each user, • News portal recommends L articles to cover the user’s interests • Users provide feedback (clicks, ratings, “likes”). • System integrates feedback for future use. ? OR • Different users have different interests • Can only learn interests by recommending and receiving feedback • Exploration versus exploitation dilemma • We model this as a bandit problem! • We address two challenges: • Diversified recommendations • Exploration for personalization Linear Submodular Bandits Problem Challenge 1: Making Diversified Recommendations • At each iteration t: • A set of available articles, At • Each article represented using D submodular basis functions • Algorithm selects a set of L articles At • Algorithm recommends At to user, receives feedback • Assumptions: • Pr(like | a,A) = wTΔ(a|A) (conditional submodular independence) • Regret: (1-1/e)OPT – sum of rewards • Should recommend optimally diversified sets of articles. • • • “Israel implements unilateral Gaza cease-fire :: WRAL.com” • “Israel unilaterally halts fire, rockets persist” • “Gaza truce, Israeli pullout begin | Latest News” • “Hamas announces ceasefire after Israel declares truce - …” • “Hamas fighters seek to restore order in Gaza Strip - World - Wire …” • “Israel implements unilateral Gaza cease-fire :: WRAL.com” • “Obama vows to fight for middle class” • “Citigroup plans to cut 4500 jobs” • “Google Android market tops 10 billion downloads” • “UC astronomers discover two largest black holes ever found” LSBGreedy Goal: recommend a set of articles that optimally covers topics that interest the user. Modeling Diversity via Submodular Utility Functions We assume a set of D concepts or topics Users are modeled by how interested they are in each topic Let Fi(A) denote the how well set of articles A covers topic i. (“topic coverage function”) We model user utility as F(A|w) = wT[F1(A), …, FD(A)] Maintain mean and confidence interval of user’s interests Greedily recommend articles with highest upper confidence utility In example below, chooses article about economy • Each topic coverage function Fi(A) is monotone submodular! • A function F is submodular if • i.e., the benefit of recommending a second (redundant) article is smaller than adding the first. Mean Estimate by Topic Uncertainty of Estimate + • Theorem: with probability 1- δaverage regret shrinks as Example: Probabilistic Coverage Each article a has probability P(i|a) of covering topic I Define topic coverage function for set A as Straightforward to show that F is monotone submodular [El-Arini et al., ‘09] News Recommender User Study 10 days, 10 articles per day Compared against Multi. Weighting (no exploration) [El-Arini et al, ‘09] Ranked Bandits + LinUCB (reduction approach, does not directly model diversity) [Radlinski et al, ’08; Li et al., ‘10] Comparing learned weights for two sessions (LSBGreedyvs MW) 1st session, MW overfits to “world “ topic 2nd session, user liked few articles, and MW did not learn anything Properties of Submodular Functions Sums of submodular functions are submodular So F(A|w) is submodular Exact inference is NP-hard! Greedy algorithm yields (1-1/e) approximation bound Incremental gains are locally linear! Both properties will be exploited by our online learning algorithm