300 likes | 407 Views
RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the Database Engine. Authors. Microsoft Research: Justin J. Levandoski University of Minnesota: Mohamed Sarwat Mohamed F. Mokbel Michael D. Ekstrand. Recommender Systems – Basic Idea.
E N D
RecStoreAn Extensible and Adaptive Framework for Online Recommender Queries inside the Database Engine
Authors • Microsoft Research: • Justin J. Levandoski • University of Minnesota: • Mohamed Sarwat • Mohamed F. Mokbel • Michael D. Ekstrand
Recommender Systems – Basic Idea • Users: provide opinions on items consumed/watched/listened to… • The system: provides the user suggestions for new items
Recommender Systems – Basic Idea • Analyze user behavior to recommend users personalized and interesting things to do/read/see Similar Users Movie Ratings Similar Items build recommendation model rate movies recommendation query Offline “Recommend user A five movies” Online
Things have changed ! • We live in an increasingly social and “real-time” world • Number of things to recommend is growing exponentially • Users expressing opinions faster than ever • Recommendations change second-to-second Facebook Posts Blog/News Items NY Times “Recommend” button “Like” button “Offline” step can no longer be tolerated
Existing Recommender Systems • No work has explored recommender system performance • Performance has always been synonymous with “quality” “We have chosen not to discuss computation performance of recommender algorithms. Such performance is certainly important, and in the future we expect there to be work on the quality of time-limited and memory-limited recommendations.” “[Our] solution is based on a huge amount of models and predictors which would not be practical as part of a commercial recommender system. However, this result is a direct consequence of the nature and goal of the competition: obtain the highest possible accuracy at any cost, disregarding completely the complexity of the solution and the execution performance." Team BelKor’s Pragmatic Chaos Winner of the 2009 Netflix Prize Herlocker et al. “Evaluating Collaborative Filtering Recommender Systems”, ACM TOIS 2004
Recommender Systems in DBMS • Incoming stream of rating data: (user, item, rating) • Ratings are used to build a recommendation model as: • Item-based collaborative filtering: (item, item, similarity) • User-based collaborative filtering: (user, user, similarity) • Recommendation query: • Item-based collaborative filtering: • Given a user u, find the top-k items that are most similar to the items that u has liked before • User-based collaborative filtering: • Given a user u, find the top-k items that the users who are similar to u have liked “Online” recommendation environments have all the pieces of a data management problem
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
RecStore – Main Idea Lets NOT try to find a new way of doing recommendation* * ACM RecSys community is already doing excellent job in this frontier. Lets start from there. RecStorepushes the Recommender Systems inside the Database Engine to provide online support and scale up the computations of existing recommender methods.
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
RecStore– System Architecture Rating Updates FAST 3 SLOW Recommendation Queries MEDIUM Intermediate Store Intermediate Filter 2 Access Methods (Index, Scan) Rating Data MEDIUM Model Table SLOW Model Filter 1 FAST
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
RecStore– System Features • Adaptivity:RecStore is adaptive to different system workloads (Query Intensive Vs. Update Intensive) • Extensibility:RecStore is extensible to support many recommendation methods (e.g., item-based CF, user-based CF).
RecStore–Adaptivity (1/6) Materialize-All (α = β = M) Rating Update -Low Latency Recommendation Query.-High Storage and maintenance Cost. 3 Recommendation Queries Intermediate Store α Intermediate Filter 2 Access Methods (Index, Scan) Rating Data Model Table Model Filter β 1
RecStore–Adaptivity(2/6) Materialize-None (α = β = 0) Rating Update -High Latency Recommendation Query-Low Storage and maintenance Cost. 3 Recommendation Queries Intermediate Store α Intermediate Filter 2 Access Methods (Index, Scan) Rating Data Model Table Model Filter β 1
RecStore–Adaptivity(3/6) Intermediate Store Only (α = M , β = 0) Rating Update -Middle Ground between Materialize-All and Materialize-None 3 Recommendation Queries Intermediate Store α Intermediate Filter 2 Access Methods (Index, Scan) Rating Data Model Table Model Filter β 1
RecStore–Adaptivity(4/6) Full Intermediate Store / Partial Model Store (α = M , β = N) Rating Update -Middle Ground between Materialize-All and Intermediate-Only 3 Recommendation Queries Intermediate Store α Intermediate Filter 2 Access Methods (Index, Scan) Rating Data Model Table Model Filter N β 1
RecStore–Adaptivity(5/6) Partial Intermediate Store / Partial Model Store (α = K , β = N) Rating Update -Lies between Partial Model and Intermediate Only 3 Recommendation Queries Intermediate Store K Intermediate Filter α 2 Access Methods (Index, Scan) Rating Data Model Table Model Filter N β 1
RecStore–Adaptivity(6/6) Materialize-All (α = β = M) Rating Update -Low Latency Recommendation Query.-High Storage and maintenance Cost. Materialize-None (α = β = 0) -High Latency Recommendation Query-Low Storage and maintenance Cost. Intermediate Store Only (α = M , β = 0) -Middle Ground between Materialize-All and Materialize-None Intermediate Store Intermediate Filter α Full Intermediate Store / Partial Model Store (α = M , β = N) Rating Data -Middle Ground between Materialize-All and Intermediate-Only Model Table Partial Intermediate Store / Partial Model Store (α = K , β = N) Model Filter β -Lies between Partial Model and Intermediate Only
RecStore– Extensibility • RecStore is Extensible to support various Recommendation Methods DBMS • The Application Developer can define a new recommendation method using SQL code • The recommendation method is registered using the SQL clause: Define RecStore Model RecStore Item-based CF (Pearson) Item-based CF (Probabilistic) Item-based CF (Cosine) MyRec User-based CF
RecStore– Extensibility DEFINE RECSTORE MODEL ItemItemCosine FROM Ratings R1, Ratings R2 WHERE R1.ItemId <> R2.itemId AND R1.userId = R2.userId WITH INTERMEDIATE STORE: (R1.itemID as item, R2.itemId as rel_itm, vector_lenp, vector_lenq, dot_prod, co_rate) WITH INTERMEDIATE FILTER: ALLOW UPDATE WITH My_IntFilterLogic(), UPDATEvector_lenpAS vector_lenp + R1.rating * R1.rating, UPDATEvector_lenqAS vector_lenp+ R2.rating * R2.rating, UPDATEdot_prodAS ot_prod + R1.rating * R2.rating, UPDATEco_rate AS 1 WITH MODEL STORE: (R1.itemId as item, R2.itemId as rel_itm, COMPUTED sim) WITH MODEL FILTER: ALLOW UPDATE WITH My_ModFilterLogic(), UPDATEsim AS if (co_rate < 50) co_rate * dot_prod / ( 50*sqrt(vector_lenp) * sqrt (vector_lenq)); else co_rate / sqrt(vector_lenp) * sqrt(vector_lenp); DBMS RecStore Intermediate Stats Item-based CF (Cosine) Model Store Simple SQL to Plug-in a new Recommendation Method
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
RecStore– Experimental Evaluation (1/3) • Machine • Intel Core2 8400 at 3Ghz with 4GB of RAM running Ubuntu Linux 8.04 • MovieLens Data • 10 Million ratings • 10k items, 70k users • Techniques • matall: materialize all (α =β = M) • ionly: intermediate store only (α =M and β = 0) • pm-m: partial model store (α =M and β = 20% of all movies) • pm-mi: partial model/partial intermediate (α =40%and β = 20% of all movies). • viewreg: Regular PostgreSQL view • viewmat: Simulated materialized view in Postgress PostgreSQL 8.4
RecStore– Experimental Evaluation (2/3) Item-Based Cosine Similarity RecStore is adaptive to a spectrum of workload ranging from query intensive workloads to update Intensive workload
RecStore– Experimental Evaluation (3/3) Item-Based Cosine Similarity • Real workload trace continuous arrival of both: • rating updates • recommender queries • against the MovieLens System.
Talk Outline • RecStore Main Idea • RecStore System Architecture • RecStore System Features • RecStore Experimental Results • Conclusion
Conclusion: Take-Away Message • Recommender Systems have all the ingredients of a data management problem. • RecStore is a step to incorporate Recommender Systems in the database engine. • RecStore is adaptive to different system workloads (queries vs. updates) • RecStore is extensible to support new recommendation methods.