1 / 22

563.10: Bloom Cookies Web Search Personalization without User Tracking

563.10: Bloom Cookies Web Search Personalization without User Tracking. Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois. Background. A trade-off between privacy and personalization from what we give search engines when we perform searches

pattir
Download Presentation

563.10: Bloom Cookies Web Search Personalization without User Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 563.10: Bloom CookiesWeb Search Personalization without User Tracking Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois

  2. Background A trade-off between privacy and personalization from what we give search engines when we perform searches • If I search for UIUC-related websites often, would I want Google to show UIUC pages when I simply type “university”? • What do I lose when I make myself more private in my searches (e.g., browsing through Tor)?

  3. A Compromise • Profile obfuscation masks the exact profile of a user’s previous searches and URLs visited • Provides some degree of privacy while allowing personalization (How can this be quantified?) • Implemented client-side or through a personalization proxy • Downsides? • Costly in bandwidth • Need for a trusted third party

  4. Profile Obfuscation Techniques • Generalization: make specifics coarser • Noise addition: add fake information

  5. Research Challenges • “What obfuscation technique is more suitable for privacy-preserving personalization of web search?” • “How big a dictionary and how much noise are required to achieve reasonable unlinkability?” • “Is it possible to receive the advantages of noisy profiles without incurring the aforementioned costs (i.e., noise dictionary and large communication overhead)?”

  6. Citation • Bloom Cookies: Web Search Personalization without User Tracking • Nitesh Mor (UC Berkeley), Oriana Riva (Microsoft), Suman Nath (Microsoft), John Kubiatowicz (UC Berkeley) • NDSS ‘15

  7. Overview • Providing personalization while preserving privacy in web searches can be done through profile obfuscation, but it is often costly or impractical. • The authors quantify and evaluate whether generalization or noise addition is better for the privacy-personalization trade-off. • The authors propose the Bloom cookie, based on the properties of a Bloom filter’s false positives, as a cost-efficient mechanism for adding noise and preserving configurable amounts of privacy.

  8. Threat Model • Server not trusted by client (user) • Techniques for hiding IP addresses are not assumed (“unlinkability” across IP addresses) • IP addresses change frequently • Browsers prevent online services from tracking (though browsers themselves keep track of previous activity) • Large population size • No collusion with other services

  9. Evaluation Techniques • Personalization (measured by average rank) • URL-based: URLs users visit most often • Interest-based: preferred interest based on prior searches • Privacy (measured by unlinkability) • RAND: add random noise from dictionary containing URLs and their associated interests • HYBRID: add random noise only from dictionary entries that correspond to interests that user has already has looked at in the past

  10. Results: Generalization • Higher unlinkability (44.1% linkable users) than using exact URLs (98.7% linkable users) • Is this reasonable?

  11. Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent in RAND

  12. Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent parameters in RAND

  13. Review: Bloom Filters • Space and time efficient probabilistic membership data structure • May have false positives; no false negatives • Stored as a bit array m = size of array k = # of hashes to use for inserting/querying elements n = # of inserted elements An uninitialized Bloom filter with m = 12

  14. Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 1) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Inserting element: “Hello” set corresponding bit locations to 1

  15. Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 3 hash2(“Hello”) = 5 hash3(“Hello”) = 9 hashes to Inserting element: “World” set corresponding bit locations to 1

  16. Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Membership query: Is “Hello”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✓ Answer: Possibly (with some probability)

  17. Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 7 hashes to Membership query: Is “Goodbye”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✗ Answer: No (guaranteed)

  18. Bloom Cookies • Add exact profile of user’s previously visited URLs as elements into Bloom filter: • Then, add noise by setting random fake bits to 1 to achieve at least l proportion of 1 bits: [“nytimes.com”,”wsj.com”, “google.com”] In effect, the false positive rate increases.

  19. Bloom Cookie Properties • EfficiencyMore compact since filter size is fixed • Noise by designFalse positives are advantages • Non-deterministic noiseNoise changes as filter changes • Dictionary-freeNo noise dictionary required • Expensive dictionary attacksAdversary would need to query for membership from the Bloom filter rather than already having the membership list

  20. Bloom Cookie System Design

  21. Results: Bloom Cookies • Cost to send is constant (2000 bits) • Linkability decreases with higher l value • No dependency on a noise dictionary

  22. Pros and Cons • Cons: • Assumption that user has browser not sending tracking info to services • No collusion assumption • Don’t justify 1,000 users to smooth outliers • Single data set • Design is described late into the paper • Study period too small • Pros: • Use of real search logs • Bloom cookie design described well • Using a “negative” of Bloom filters as a positive • No need for a third party • Limitations section • Clear and well written • Useful diagrams

More Related