250 likes | 387 Views
Hybrid-ε-greedy for Mobile Context-Aware Recommender System. Djallel Bouneffouf , Amel Bouzeghoub & Alda Lopes Gançarski Institut Télécom, Télécom SudParis, France. outline. Introduction State of the art Proposition Experimental evaluation Conclusion. outline. Introduction
E N D
Hybrid-ε-greedy for Mobile Context-Aware Recommender System Djallel Bouneffouf, Amel Bouzeghoub & Alda Lopes Gançarski Institut Télécom, Télécom SudParis, France
outline • Introduction • State of the art • Proposition • Experimental evaluation • Conclusion
outline • Introduction • State of the art • Proposition • Experimentalevaluation • Conclusion
Software editor Access and navigation into the corporate data www.nomalys.com 4
MOBILE INFORMATION SYSTEMSContext To reduce search and navigation time Context-basedRecommender System To assist users in finding information
PROBLEMS IN CONTEXT-BASED RECOMMENDER SYSTEM • How to recommend information to users taking into account their surrounding environment (location, time, near people)? • How to follow the evolution of user’s interest? Item Inventory Articles, web page, documents, … Context location, time, … • Contextual Recommender System algorithm: • Selects item(s) to show • Gets feedback (click, time spent,..) • Refine the models • Repeats (large number of times) with an • Optimization of metrics of interest • (Total number of clicks, Total rewards,…) USER
outline • Introduction • State of the art • Proposition • Experimentalevaluation • Conclusion
USER OR EXPERT SPECIFICATION • Advantage • Context management • Constraints • Laborious • Not a dynamic system • Not a personalized system
Content-Based and Collaborativefiltering • Advantage • Context management • Automaticprocess Dataset • Constraints • Cold start problem • Slow training • Reward Situations Meeting Drive Home Action Office Social group
Machine learning Reinforcement learning 1 2 • Advantage • Solve cold start problem • Follow the evolution of user interest 1 2 1 1 • Constraints • No context management • Slow training mean= 0.48 Exploitation Exploration mean= 0.79 - The greedy strategy only exploitation; - The ε-greedy strategy adds some random action.
outline • Introduction • State of the art • Proposition • Experimentalevaluation • Conclusion
Multi-armed bandits (MAB) • Recommender system • Arms documents • Rewards clicks • A (basic) MAB problem has: • A set D of possibilities (documents) • A CTR(d) ∈ [0,1] of expected rewards for each d∈D • In each round: algorithm picks document d∈D based on past history • Reward: independent sample in [0,1] with expectation CTR (d) • Classical setting that models exploration/exploitation trade-off
Contextual Bandits • Context-basedRecommender System • Contexts User’s situations • Arms documents • Rewards clicks • X is aset of situations, • D is a set of arms, • CTR: X x D [0,1] expected rewards • Situation x∈ X occurs • In each round: • Algorithm picks arm d ∈ D • Rewards: independent sample in [0,1] with expectation CTR(x, d) x1 x2 1 2 x3 Situations Meeting Home Drive Office
Get situation from ContextSensing Mon Oct 3 12:10:00 2011 GPS "38.868143, 2.3484122" NATIXIS
Get situation fromContextThinking Abstraction GPS "38.868143, 2.3484122" NATIXIS Mon Oct 3 12:10:00 2011 Time Ontology Location Ontology Social Ontology
Get situation fromContextRetrieving the relevant situation RetrieveSituation
Select Documents CBR-ε-greedy • Content-Based filtering (CBF) argmaxd(CTR(d)) p(1-ε) dt = Random(D)p(ε) • CBF (d) gives documents similar to document d ε is the probability of exploration Hybrid-ε-greedy argmaxd(CTR(d)) p(1-ε) dt = CBF (d)p(z) Random(D) p(k) ε = z+k
outline • Introduction • State of the art • Proposition • Experimental evaluation • Conclusion
Experimental Datasets • Data from Nomalys • 16 286 diary situations • 342 725 navigation entries Diary situation entries Diary navigation entries
Recommend documents ε- Variation • ε variation on learning CTR CTR ε variation on deployment argmaxd(CTR(d)) p(1-ε) dt = Random(D)p(ε) ε is the probability of exploration
recommend documents Data size variation • Data size on learning CTR CTR Data size Data size Data size on deployment
Conclusion • Our experiments yield to the conclusion that: • Considering the user’s context for the exploration/exploitation strategy significantly increases the performance of the recommender system. In the future: • We plan to investigate methods that automatically learn the optimal exploitation and exploration trade-off.