220 likes | 224 Views
563.10: Bloom Cookies Web Search Personalization without User Tracking. Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois. Background. A trade-off between privacy and personalization from what we give search engines when we perform searches
E N D
563.10: Bloom CookiesWeb Search Personalization without User Tracking Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois
Background A trade-off between privacy and personalization from what we give search engines when we perform searches • If I search for UIUC-related websites often, would I want Google to show UIUC pages when I simply type “university”? • What do I lose when I make myself more private in my searches (e.g., browsing through Tor)?
A Compromise • Profile obfuscation masks the exact profile of a user’s previous searches and URLs visited • Provides some degree of privacy while allowing personalization (How can this be quantified?) • Implemented client-side or through a personalization proxy • Downsides? • Costly in bandwidth • Need for a trusted third party
Profile Obfuscation Techniques • Generalization: make specifics coarser • Noise addition: add fake information
Research Challenges • “What obfuscation technique is more suitable for privacy-preserving personalization of web search?” • “How big a dictionary and how much noise are required to achieve reasonable unlinkability?” • “Is it possible to receive the advantages of noisy profiles without incurring the aforementioned costs (i.e., noise dictionary and large communication overhead)?”
Citation • Bloom Cookies: Web Search Personalization without User Tracking • Nitesh Mor (UC Berkeley), Oriana Riva (Microsoft), Suman Nath (Microsoft), John Kubiatowicz (UC Berkeley) • NDSS ‘15
Overview • Providing personalization while preserving privacy in web searches can be done through profile obfuscation, but it is often costly or impractical. • The authors quantify and evaluate whether generalization or noise addition is better for the privacy-personalization trade-off. • The authors propose the Bloom cookie, based on the properties of a Bloom filter’s false positives, as a cost-efficient mechanism for adding noise and preserving configurable amounts of privacy.
Threat Model • Server not trusted by client (user) • Techniques for hiding IP addresses are not assumed (“unlinkability” across IP addresses) • IP addresses change frequently • Browsers prevent online services from tracking (though browsers themselves keep track of previous activity) • Large population size • No collusion with other services
Evaluation Techniques • Personalization (measured by average rank) • URL-based: URLs users visit most often • Interest-based: preferred interest based on prior searches • Privacy (measured by unlinkability) • RAND: add random noise from dictionary containing URLs and their associated interests • HYBRID: add random noise only from dictionary entries that correspond to interests that user has already has looked at in the past
Results: Generalization • Higher unlinkability (44.1% linkable users) than using exact URLs (98.7% linkable users) • Is this reasonable?
Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent in RAND
Results: Noise Addition • Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise • HYBRID makes personalization worse than with equivalent parameters in RAND
Review: Bloom Filters • Space and time efficient probabilistic membership data structure • May have false positives; no false negatives • Stored as a bit array m = size of array k = # of hashes to use for inserting/querying elements n = # of inserted elements An uninitialized Bloom filter with m = 12
Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 1) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Inserting element: “Hello” set corresponding bit locations to 1
Review: Bloom Filters • Adding an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 3 hash2(“Hello”) = 5 hash3(“Hello”) = 9 hashes to Inserting element: “World” set corresponding bit locations to 1
Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Membership query: Is “Hello”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✓ Answer: Possibly (with some probability)
Review: Bloom Filters • Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 7 hashes to Membership query: Is “Goodbye”in the list? check that all corresponding bit locations are 1 ✓ ✓ ✗ Answer: No (guaranteed)
Bloom Cookies • Add exact profile of user’s previously visited URLs as elements into Bloom filter: • Then, add noise by setting random fake bits to 1 to achieve at least l proportion of 1 bits: [“nytimes.com”,”wsj.com”, “google.com”] In effect, the false positive rate increases.
Bloom Cookie Properties • EfficiencyMore compact since filter size is fixed • Noise by designFalse positives are advantages • Non-deterministic noiseNoise changes as filter changes • Dictionary-freeNo noise dictionary required • Expensive dictionary attacksAdversary would need to query for membership from the Bloom filter rather than already having the membership list
Results: Bloom Cookies • Cost to send is constant (2000 bits) • Linkability decreases with higher l value • No dependency on a noise dictionary
Pros and Cons • Cons: • Assumption that user has browser not sending tracking info to services • No collusion assumption • Don’t justify 1,000 users to smooth outliers • Single data set • Design is described late into the paper • Study period too small • Pros: • Use of real search logs • Bloom cookie design described well • Using a “negative” of Bloom filters as a positive • No need for a third party • Limitations section • Clear and well written • Useful diagrams