250 likes | 415 Views
Privacy-Preserving Eigentaste-based Collaborative Filtering. Ibrahim Y akut and Huseyin P olat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University , Turkey. Collaborative Filtering (CF). Problem Information Overload. Solution Collaborative Filtering.
E N D
Privacy-Preserving Eigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey
CollaborativeFiltering(CF) Problem InformationOverload Solution Collaborative Filtering IWSEC'07
CollaborativeFiltering • Recent technique for filtering and recommendation • Applications • E-commerce • Search engines • Direct recommendations IWSEC'07
Collaborative Filtering Process Item for which prediction is sought i1 i2 iq im u1 u2 Prediction ua Active user un Paq = Prediction on item q for active user IWSEC'07
EigenTaste • Proposed by Goldberg et al in 2001 • The main feature: Online computation in constant time. • Secondly, flexibly usage of several clustering algorithms. • Based on Principal Component Analysis • Application in Jester: online joke recommendation. http://eigentaste.berkeley.edu/ IWSEC'07
EigentasteAlgorithm m items k gauge items Step.1 Find correlation matrix of A Step.2 Find eigenvectors(E) and eigenvalues() of C A: nxk D:nxm n users User-item matrix Correlation Matrix of A IWSEC'07
EigentasteAlgorithmcont’d Step.3 Take first m=2 eigenvectors and project A. x = AEmT = AE2T Step.4 Cluster the projected data using RRC. Recursive Rectangular Clustering(RRC) Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters. IWSEC'07
Eigentaste- online • When active user(a) enters, • Rate the items in gauge set. • Using PCs of his data, a is projected • Find representative cluster • Recommend objects based on preconstructed lookup table. Disapprove Approve IWSEC'07
Motivation • Mentionedalgorithm is succesful • But duetoprivacyrisks, collection of truthfulandtrustworthy data is challenge!!! • Therefore, how can usersgive data for CF purposeswithoutjeopardizingtheirprivacy? • Is it possibletouseperturbed data in Eigentaste-basedalgorithms? IWSEC'07
Modifications on Original • Normalization: • Instead of item mean and std, user mean and std. • Clustering: • Instead of RRC, k-means clustering is used. • Prediction • Instead of look up table directly, denormalize then predict. IWSEC'07
Masking data CF Process Central Database Randomized Pertubation Technique (RPT) Aggrawal&Srikant, 2000 +Rn-1 +Rn +R1 +R2 User1 User2 Usern-1 Usern IWSEC'07
MaskingProcess γθδ • Users and servers agree on γ, θ, δ • Each user u compute z-scores of their ratings • u selects σuover [0, γ] uniformly randomly, use it as std of masking data • u selects ru over [0,1], if ru<= θ, use uniform otherwise gaussian • u selects xerover [0, δ]. %xer of unfilled cells to be filled with noise IWSEC'07
MaskingProcess • u creates munumber of random numbers where • mu= number of rated cell+xer • std=σu, μ=0, gaussian or uniform(√3.σu) wrt ru • Mask his private data by adding this noise data. Here empty cells are selected randomly. IWSEC'07
Eigentaste-based CF withPrivacy • Now server holds disguised user-item matrix, D’and user-gauge matrix A’ • In some steps, the effects of perturbation must be considered and handled! • Correlation matrix construction • Projection • Active user’s entry of gauge set IWSEC'07
CorrelationMatrixConstrction If f≠g means for nondiagonal entries of C’ Expected values 0 0 0 since μ=0 Then IWSEC'07
CorrelationMatrixConstrction If f=g means for diagonal entries of C’ Expected value is 0 since μ=0 Then, assumming n≈n-1 IWSEC'07
Projection Similarly, expected values are 0, then approximated matrix is obtained IWSEC'07
RemainingParts • After determining clusters depending on estimated data • Z-score means of nongauge items are stored in look up table. • When active user, enters disguised gauge ratings the effect of randomization is got rid of by the same way. • The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained! IWSEC'07
Experiments • Data Set • Jester is a web-based joke data • 17,988 users, 100 jokes • Ratings over a range (-10,+10),continuos • 50% of all ratings are present • Evaluation Metrics p:predicted value r:original value d:size of test set rmax:max rating rmin:min rating IWSEC'07
Eigentaste vs. Modified • 9000 training users, 5000 test users(10 test items) IWSEC'07
Protectingactiveusers’ privacy M1: No disguise, but requires additional cost M2: Just considering gauge mean and std M3: Considering whole mean and std IWSEC'07
Accuracy vs. VaryingNumbers of Users Fix 5000 users and random 10 test items • By increasing number of users, accuracy improves since random numbers will converge to zero • n>=2000, results are satisfying! IWSEC'07
AccuracywithVaryingδValues Accuracyslightlybecomesbetterwithdecreasingδvalues! IWSEC'07
Conclusion • We showed that how to achieve privacy preserving CF tasks using Eigentaste-based algorithms? • We will study • whether we can employ other clustering algorithms • How to improve recommendation qualitiesby using correlation based CF algorithms. IWSEC'07
Thanks for your interests! Questions?