1 / 16

“You might also like: …”

“You might also like: …”. Privacy risks of collaborative filtering Yuval Madar , June 2012 Based on a paper by J.A. Calandrino , A. Kilzer , A. Narayanan, E. W. Felten & V. Shmatikov. The setting: Recommendation systems.

lynnea
Download Presentation

“You might also like: …”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “You might also like: …” Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov

  2. The setting: Recommendation systems • Help suggesting users items and other users to their liking by deducing them from previous purchases. • Numerous examples • Amazon, iTunes, CNN, Last.fm, Pandora, Netflix, Youtube, Hunch, Hulu, LibraryThing, IMDb and many others.

  3. Recommendation types • User to item – “You might also like…” • Item to item – Similar items list • User to user – Another customer with common interests

  4. Recomandation methods • Content Based filtering • Based on A-priori similarity between items, and recommendations are derived from a user’s own history. • Doesn’t pose a privacy threat. • Collaborative filtering • Based on correlations between other uses purchases. • Our attacks will target this type of systems. • Hybrid • A system employing both filtering techniques

  5. Attack Model • The data the recommendation system uses is modeled as a matrix, where the rows correspond to users, and columns correspond to items. • Some auxiliary information on the target user is available (A subset of a target user’s transaction history) • An attack is successful, if it allows the attacker to learn transactions not part of its auxiliary information.

  6. Sources for auxiliary information • User public rating and comments on products • Shared transactions (Via facebook, or other mediums) • Discussions in 3rd party sites • Favorite books in facebook profile • Non-online interactions (With friends, neighbors, coworkers, etc.) • Other sources…

  7. Attack 1 - Related-items list inference • Input: • a set of target items T and a set of auxiliary items A • Observe the related items list of A, until an item in T appears, or moves up. • If a target item appears in enough related items lists in the same time, the attacker may infer it was bought by the target user. • Note 1 – Scoring may be far more complex, since different items in A are correlated. (Books which belong to a single series, bundle discounts, etc.) • Note 2 – It is preferable that A consist of obscure and uncommon items, to improve the effect of the target user’s choices on its related items lists.

  8. Attack 2 - Covariance Matrix inference • In some sites, the covariance matrix, describing the correlation between items in the site, is exposed to the users. (Hunch is one such website) • Similarly, the attacker is required to watch for improvement in the correlation between the auxiliary items and the target items. • Note 1 – Asynchronous updates to different matrix cells. • Note2 – inference probability improves if the auxiliary items are user-unique. (No other user bought all auxiliary items) More likely if some of them are unpopular, or if there are enough of them.

  9. Attack 3 - kNN recommender systems inference • System model • For each user, the system finds the k users most similar to it, and ranks items purchased by them by total number of sales. • Active Attack • Create k dummy users, each buying all known auxiliary items. • With high probability, the k dummy users and the target user will be clustered together. (Given auxiliary items list of size logarithmic in the total number of users. In practice, 8 items were found to be enough for most sites) • In that case, the recommendations to the dummy users will consist of transactions of the target user previously unknown to the attacker. • Note – The attack is more feasible in a system where user interactions with items does not involve spending money.

  10. Attack Metrics • The main parameters for evaluation of an inference attack are: • Yield – How many inferences are produced. • Accuracy – How likely is each inference. • Yield-accuracy tradeoff - stricter accuracy algorithms reject less probable inferences.

  11. Actual attacks • The paper further discusses specific attacks performed against: • Hunch • LibraryThing • Last.fm • Amazon • And measures the accuracy and yield of these attacks, arriving in some instances to impressive tradeoff figures. (Such as 70% accuracy for 100% yield in Hunch)

  12. Differential privacy • Not discussed in the paper. • Achieved in other papers for static recommendation databases. • Remains an open problem for dynamic systems. (Which all real world examples are)

  13. Decreasing the privacy risk • Limited-length related items list – The first elements of such lists have low sensitivity to single purchases. • Factoring item popularity into update frequency – less popular items are more sensitive to single purchases. Batching their purchases together will decrease the information leak.

  14. Decreasing the privacy risk • Limit data access rate – Preventing large-scale privacy attacks, though lowering utility and may be circumvented using a botnet. • User opt-out – A privacy conscious user may decide to opt-out of recommender systems entirely. (At clear cost of utility)

  15. Conclusion • A passive attack on recommender systems using auxiliary information on a certain user’s purchases, allowing the attacker to infer undisclosed private transactions. • Increased user awareness is required • Suggested several methods to decrease the information leaked by these systems

  16. Questions?

More Related