230 likes | 358 Views
Collaborative filtering with privacy. Wim Verhaegh Aukje van Duijnhoven Jan Korst Pim Tuyls IPA herfstdagen, 23 November 2004. Privacy issue. Personalization is key in Ambient Intelligence requires user profiles Privacy risks of services untrusted server server gets hacked
E N D
Collaborative filtering with privacy Wim Verhaegh Aukje van Duijnhoven Jan Korst Pim Tuyls IPA herfstdagen, 23 November 2004
Privacy issue • Personalization is key in Ambient Intelligence • requires user profiles • Privacy risks of services • untrusted server • server gets hacked • server goes bankrupt • Perform personalization on encrypted data • collaborative filtering
Overview • Collaborative filtering system • Privacy requirements • CF method • calculation scheme (formulas & example) • Encryption basics • Encrypted CF method • Item-based CF • Conclusion
Collaborative filtering system • System to recommend new content • recommend content that ‘similar users’ like calculate similarities database with ratings music player similarity values user predict missing ratings recommend server side user side
How to perform collaborative filtering? Security requirements • Nobody may know users’ ratings • not even anonymously • Nobody may know who rated what • not even anonymously • Nobody may know who resembles who
users x x x x x x x x x x x x x x x x x x x x items x x x x x x x x x x x x x x x x x x x Collaborative filtering methods • Memory based • computes similarities and interpolates • user based • item based • Model based • first uses rating database to build a model(e.g. extract basic rating profiles) • uses model for prediction • Most approaches can be encrypted
Memory-based CF with user similarities • Two steps • determine similarities between users • predict missing ratings • Step 1: Pearson correlation
Step 2: prediction • E.g. weighted deviations from the average • similarities are weights
Example • Tea and coffee flavors • 4 users • 9 items (flavors)
Compute similarities, e.g. Example • Subtract averages
Example • Use similarities to predict missing ratings • Prediction for Aukje, tea T3
Public key encryption scheme: Paillier • Generate keys • choose large random primes p, q(private) • calculate n = pq and a ‘generator’ g (public) • Encrypt message m bywith r random • Homomorphism properties
Encrypted inner product • User a: • User b: • User a encrypts vector and sends to b • User b computesand sends back to a • User a decrypts it to get inner product
Encrypted CF: correlation step • Rewrite correlation as three inner productswhere • Zeros to avoid contributions from in sums
Encrypted CF: correlation step • Protocol • Active user knows correlation values, but not to whom • Server knows between whom, but not the correlation values active user server other users copy
Encrypted CF: prediction step • Rewrite • Protocol • each user b adds random factor active user server other users split
users x x x x x x x x x x x x x x x x x x x x items x x x x x x x x x x x x x x x x x x x Memory-based CF with item similarities • Similarities computed between items • compare rows in the matrix • similar formulas
Memory-based CF with item similarities • Similarities • Predictions
Threshold Paillier • Calculation of sums: use threshold encryption • key is shared among k users • decryption needs > t shares server users > t
Encrypted item-correlation step • Rewrite correlation • Protocol server users > t
Encrypted item-based prediction • Rewrite prediction formula • item average: two sums • prediction: four inner products (server & user a) • protocols as before
Conclusion • Collaborative filtering can be encrypted • various correlation and prediction formulas • various CF approaches • More computations to be done at users’ sites • encryption and decryption • users have to be online • Future work • protection against more complicated attacks • peer-to-peer solution