220 likes | 236 Views
Explore the balance between privacy and personalization in search engines, focusing on profile obfuscation techniques like generalization and noise addition. Learn about the effective use of Bloom Cookies for privacy-conscious personalization.
E N D
563.10: Bloom CookiesWeb Search Personalization without User Tracking Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois
Background A trade-off between privacy and personalization from what we give search engines when we perform searches • If I search for UIUC-related websites often, would I want Google to show UIUC pages when I simply type “university”? • What do I lose when I make myself more private in my searches (e.g., browsing through Tor)?
A Compromise • Profile obfuscation masks the exact profile of a user’s previous searches and URLs visited • Provides some degree of privacy while allowing personalization (How can this be quantified?) • Implemented client-side or through a personalization proxy • Downsides? • Costly in bandwidth • Need for a trusted third party
Profile Obfuscation Techniques • Generalization: make specifics coarser • Noise addition: add fake information
Research Challenges • “What obfuscation technique is more suitable for privacy-preserving personalization of web search?” • “How big a dictionary and how much noise are required to achieve reasonable unlinkability?” • “Is it possible to receive the advantages of noisy profiles without incurring the aforementioned costs (i.e., noise dictionary and large communication overhead)?”
Citation • Bloom Cookies: Web Search Personalization without User Tracking • Nitesh Mor (UC Berkeley), Oriana Riva (Microsoft), Suman Nath (Microsoft), John Kubiatowicz (UC Berkeley) • NDSS ‘15
Overview • Providing personalization while preserving privacy in web searches can be done through profile obfuscation, but it is often costly or impractical. • The authors quantify and evaluate whether generalization or noise addition is better for the privacy-personalization trade-off. • The authors propose the Bloom cookie, based on the properties of a Bloom filter’s false positives, as a cost-efficient mechanism for adding noise and preserving configurable amounts of privacy.
Threat Model • Server not trusted by client (user) • Techniques for hiding IP addresses are not assumed (“unlinkability” across IP addresses) • IP addresses change frequently • Browsers prevent online services from tracking (though browsers themselves keep track of previous activity) • Large population size • No collusion with other services
Evaluation Techniques • Personalization (measured by average rank) • URL-based: URLs users visit most often • Interest-based: preferred interest based on prior searches • Privacy (measured by unlinkability) • RAND: add random noise from dictionary containing URLs and their associated interests • HYBRID: add random noise only from dictionary entries that correspond to interests that user has already has looked at in the past
Results: Generalization • Higher unlinkability (44.1% linkable users) than using exact URLs (98.7% linkable users) • Is this reasonable?
Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent in RAND
Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent parameters in RAND
Review: Bloom Filters • Space and time efficient probabilistic membership data structure • May have false positives; no false negatives • Stored as a bit array m = size of array k = # of hashes to use for inserting/querying elements n = # of inserted elements An uninitialized Bloom filter with m = 12
Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 1) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Inserting element: “Hello” set corresponding bit locations to 1
Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 3 hash2(“Hello”) = 5 hash3(“Hello”) = 9 hashes to Inserting element: “World” set corresponding bit locations to 1
Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Membership query: Is “Hello”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✓ Answer: Possibly (with some probability)
Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 7 hashes to Membership query: Is “Goodbye”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✗ Answer: No (guaranteed)
Bloom Cookies • Add exact profile of user’s previously visited URLs as elements into Bloom filter: • Then, add noise by setting random fake bits to 1 to achieve at least l proportion of 1 bits: [“nytimes.com”,”wsj.com”, “google.com”] In effect, the false positive rate increases.
Bloom Cookie Properties • EfficiencyMore compact since filter size is fixed • Noise by designFalse positives are advantages • Non-deterministic noiseNoise changes as filter changes • Dictionary-freeNo noise dictionary required • Expensive dictionary attacksAdversary would need to query for membership from the Bloom filter rather than already having the membership list
Results: Bloom Cookies • Cost to send is constant (2000 bits) • Linkability decreases with higher l value • No dependency on a noise dictionary
Pros and Cons • Cons: • Assumption that user has browser not sending tracking info to services • No collusion assumption • Don’t justify 1,000 users to smooth outliers • Single data set • Design is described late into the paper • Study period too small • Pros: • Use of real search logs • Bloom cookie design described well • Using a “negative” of Bloom filters as a positive • No need for a third party • Limitations section • Clear and well written • Useful diagrams