1 / 1

Hierarchical Exploration for Accelerating Contextual Bandits

Hierarchical Exploration for Accelerating Contextual Bandits. Yisong Yue, Sue Ann Hong and Carlos Guestrin . Personalized Recommender Systems. CoFineUCB : Coarse-to-Fine Hierarchical Exploration. Every day, user visits news portal Wish to personalize to her preferences

howie
Download Presentation

Hierarchical Exploration for Accelerating Contextual Bandits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue, Sue Ann Hong andCarlos Guestrin Personalized Recommender Systems CoFineUCB: Coarse-to-Fine Hierarchical Exploration • Every day, user visits news portal • Wish to personalize to her preferences • Can only learn from feedback • E.g., user clicks on or “likes” article • Leads to exploration vs exploitation dilemma • Goal is to satisfy user • Must make exploratory recommendations to learn user’s preferences • Formalized as a contextual bandit problem Two tiered exploration: First in subspace Then in full space Linear Stochastic Bandit Problem At each iteration t: Set of available actions Xt = {xt,1, …, xt,n} (available articles) Algorithm chooses action xtfrom Xt (recommends an article) User provides feedback ŷt(user clicks on or “likes” the article) Algorithm incorporates feedback Assumptions: E[ŷt] = w*Txt (w* is unknown to system) Regret: • Theorem: with probability 1- δaverage bounded by Balancing Exploration vs. Exploitation Constructing Feature Hierarchies Using Prior Knowledge At each iteration: In example below: select article on economy: “Upper Confidence Bound” Given empirical sample of learned profiles W Can also be used to reshape full space (use LearnU(W,D)) Estimated Gain Uncertainty News Recommender Simulations & User Study Naïve LinUCB Mean Estimate by Topic Uncertainty of Estimate Reshaped Full Space + Feature Hierarchies Subspace Suppose “stereotypical users” span K-dimensional space E.g., “European vs. Asian news” LetU = D x K matrix Define projection of articles into subspace: Define representation of user profile: Thus: Coarse-to-Fine Approach “Atypical Users” “All Users” Leave-one-out simulation validation Compared against hierarchy-free baselines CoFineUCB combines efficiency of Subspace Learning with flexibility of Full Space Learning Live User Study Showed real users real articles 10 articles/day, 10 days Counted #likes • If then suffices to learn primarily in subspace • K-dimensional space much more • efficient to explore • Explore full space as needed

More Related