1 / 0

Non-tracking Web Analytics

Non-tracking Web Analytics. Istemi Ekin Akkus 1 , Ruichuan Chen 1 , Michaela Hardt 2 , Paul Francis 1 , Johannes Gehrke 3. 1 Max Planck Institute for Software Systems 2 Twitter Inc. 3 Cornell University. Web Analytics. Statistics about users visiting a publisher website.

mikaia
Download Presentation

Non-tracking Web Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-tracking Web Analytics

    Istemi Ekin Akkus1, Ruichuan Chen1, Michaela Hardt2, Paul Francis1, Johannes Gehrke3 1Max Planck Institute for Software Systems 2 Twitter Inc. 3 Cornell University
  2. Web Analytics Statistics about users visiting a publisher website Non-tracking Web Analytics
  3. Analytics by Data Aggregators Publisher Data Aggregator Collect analytics for many publishers from many clients Inferextended analytics Age, gender, education level, other sites visited, … Provide aggregate information to publishers & advertisers Aggregate Extended Analytics Non-tracking Web Analytics
  4. Analytics Today Publisher Data Aggregator Client Non-tracking Web Analytics
  5. Tracking Data aggregators criticized Collection of individual information Criticisms led to reactions Do-not-Track proposal, EU cookie law Voluntary opt-out mechanisms by aggregators Client-side tools to blacklist aggregators Fewer tracked users less data for inference  worse extended analytics for publishers Non-tracking Web Analytics
  6. Goal Replicate the functionality of today’s systems without tracking Non-tracking Web Analytics
  7. Specific Goals Privacy No individual information collected by publishers & aggregators Functionality Aggregate information for publishers & aggregators No new organizational components Practical and efficient Non-tracking Web Analytics
  8. Outline Motivation & Goals Components & Assumptions Non-tracking Analytics Implementation & Evaluation Conclusion Non-tracking Web Analytics
  9. Components Client locally stores information about the user Publisher serves webpages to clients Aggregator provides aggregation service Non-tracking Web Analytics
  10. Assumptions Potentially malicious client May try to distort results Potentially malicious publisher May try to violate individual user privacy Honest-but-curious data aggregator Follows the protocol Doesn’t collude with publishers Non-tracking Web Analytics
  11. Outline Motivation & Goals Components & Assumptions Non-tracking Analytics Publisher as Proxy Noise Yes-No Queries Auditing Implementation & Evaluation Conclusion Non-tracking Web Analytics
  12. Today Not anonymous; need a proxy… …, but don’t want a new component Publisher already interacts with clients! Non-tracking Web Analytics
  13. Publisher as Anonymizing Proxy 4. Results 1. Queries 2. Encrypted Answers 3. Encrypted Answers Clients never exposed to the data aggregator Publisher collects encrypted answers Publisher distributes queries to be executed Publisher forwards answers to the aggregator Aggregator counts anonymous answers and returns results Non-tracking Web Analytics
  14. Identifiers in Responses Rare attributes Job: CEO of ACME Enc(CEO of ACME) Enc(CEO of ACME) example.com CEO of ACME visits my site! CEO of ACME visits example.com Non-tracking Web Analytics
  15. Noise 3. Add Noise_Publisher 5. Add Noise_Aggregator 6. Double-noisy Result 2. Encrypted Answers 4. Noisy Encrypted Answers Result with Noise_Aggregator Result with Noise_Publisher 7. Remove Noise_Publisher Both entities obtain noisy results Non-tracking Web Analytics
  16. Differentially-private Noise Hides the existence of an individual answer CEO: real or noise?? Requires numerical values ? Non-tracking Web Analytics
  17. Yes-No Questions Convert queries to binary & count answers “What is your job?”  “Is your job ‘CEO’?”  Noise as additional answers Enc(‘Yes’), Enc(‘No’) Bonus: limits a malicious client Either +1 or 0 Many possible values  Many questions Job: ‘CEO’, ‘Student’, ‘Gardener’, ... Non-tracking Web Analytics
  18. Buckets Multiple yes-no questions with one query Enumerate possible answer values Job: {‘CEO’, ‘Student’, `Gardener’, `Teacher’, ...} A fixed number of ‘Yes’ answers Job: 1 Clients choose ‘Yes’ for the matching bucket Enc(‘CEO = Yes’) Publisher generates additional answers Enc(‘CEO = Yes’), Enc(‘Student = Yes’), ... Non-tracking Web Analytics
  19. Impracticalities of Differential Privacy Requires a privacy budget Stop answering when budget expires No answers from clients  low-utility results Assumes a staticdatabase; our setting is dynamic User population of a publisher changes Certain user data may change  Clientskeep answering queries Non-tracking Web Analytics
  20. Malicious Publishers Isolation attacks Isolate a user’s response Repeat the same query Cancel out noise Specific query conditions or buckets Monitoring and approval by the data aggregator Selectively dropping client responses Non-tracking Web Analytics
  21. Isolation via Dropping Responses Enc(Student) Mechanic: 1 + noise Driver: 2 + noise CEO: 1 + noise example.com Enc(CEO) Enc(CEO) Enc(Student) Enc(Gardener) User in the middle is a CEO! Enc(Driver) Enc(Mechanic) Enc(Driver) Enc(Gardener) Non-tracking Web Analytics
  22. Auditing Enc(Student) example.com nonce? example.com Enc(CEO) Enc(CEO) Enc(Student) Enc(nonce) Enc(Driver) Enc(Mechanic) Enc(nonce) Enc(Driver) Enc(example.com, nonce) Enc(example.com, nonce) Non-tracking Web Analytics
  23. Outline Motivation & Goals Components & Assumptions Non-tracking Analytics Publisher as Proxy Noise Yes-No Answer Auditing Implementation & Evaluation Conclusion Non-tracking Web Analytics
  24. Implementation 2000 lines of code in total Client: Firefox extension Publisher software: Piwik plugin Aggregator software: simple server Deployed and tested with over 200 users RSA public key cryptosystem Non-tracking Web Analytics
  25. Evaluation – Decryption Overhead Aggregator: 2.4 GHz CPU, 2048-bit key Publisher: 50K users, 2 sets of queries/week Information currently provided Demographics, other sites 3.6 CPU hours/week Information available through our system # pages browsed, search engines, visit frequency to other sites 3 CPU hours/week Non-tracking Web Analytics
  26. Evaluation – Client Overhead Bandwidth overhead <100KB/week to download 11 queries 8KB/week for all query responses CPU overhead for encryption Google Chrome: 380 enc/sec Firefox: 20 enc/sec Non-tracking Web Analytics
  27. Summary Extended analytics without tracking Differential privacy guarantees for users Aggregate information for publishers & aggregators No new organizational component Practical & feasible to deploy Non-tracking Web Analytics
More Related