1 / 25

What’s the Gist? Privacy-Preserving Aggregation of User Profiles

What’s the Gist? Privacy-Preserving Aggregation of User Profiles. Igor Bilogrevic (Google), Julien Freudiger (PARC) , Emiliano De Cristofaro (UCL), Ersin Uzun (PARC). Scott Kildall – Data Crystals. Data is the Crux of Internet Economy.

Download Presentation

What’s the Gist? Privacy-Preserving Aggregation of User Profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s the Gist? Privacy-Preserving Aggregation of User Profiles Igor Bilogrevic(Google), Julien Freudiger(PARC), Emiliano De Cristofaro(UCL), ErsinUzun(PARC) Scott Kildall – Data Crystals

  2. Data is the Crux of Internet Economy • Corporations seek personal data for better targeting • More data and more sensitive data Third Parties Data Brokers Users Users Users Users • Credit card transactions • Interests • Political party • Apps usage • Browsing history • Mobility patterns • …

  3. Issues with Current Approach • Privacy • What personal data is collected? • How much and how good is it? • Transparency • Who knows what about me?[1] • Where does this data come from? • Remuneration • Users value their data • Users don’t get money for it Data Brokers A Call for Transparency and Accountability FTC, May 2014 [1] aboutthedata.com

  4. “This question calls for Acxiom to provide information that would reveal business practices that are of a highly competitive nature. Acxiom cannot provide a list of each entity that has provided data from, or about, consumers to us.” ACXIOM

  5. Julian Oliver - 2013

  6. An Emerging Model Third Parties Data Brokers Users Users Users Users Participatory Data Brokers • Benefits • Users retain control over who access what about them • Users decide what data can be monetized • Users get some revenue

  7. “What if Facebook paid you? Several startups envision an era in which we are all the brokers, and beneficiaries, of our own personal data.“ David Zax, Is personal data the new currency?MIT Tech Review You

  8. Our Contribution • What’s the Gist? • Method for monetization of user personal data with privacy • Users choose what to share • Brokers are not required to be trustworthy • Idea • Rather than selling data as-is, monetize a model of the data User data (age) User1 22 User2 56 User3 43 User4 33 … Aggregate (age) pdf Age 20 30 40 50 60

  9. System Architecture 4. Extract features 2. Select users 3. Queries 1. Query Third Party Aggregator Users Users Users Users 5. Noisy encrypted answers 7. Answer 6. Aggregate, decrypt, sample, and monetize • Interactive mode • Customer queries for certain desired aggregates • Batch mode • Aggregator prepares certain aggregates

  10. Users – Profile Computation • Each user ihas profile pi with K attributes {ai,j} • Each element ai,jis an integer representing a value or a preference 28 223 5 6 .. 2 3 Age # of friends Action movies Drama movies … Rock music History books ai,2 ai,2 ai,3 .. .. ai,K Example pi = pi = User i

  11. Users – Feature Computation • Features depend on chosen probability model • For Gaussian model, each user i computes • fi = {[ai,1 , ai,12], …, [ai,K, ai,K2]} [28], [282] [223], [2232] [5], [52] [6], [62] .. [2], [22] [3], [32] Age # of friends Action movies Drama movies … Rock music History books pi =

  12. Private Aggregation • Assume • Privacy Differentially private ri preventsaggregator from deducting user data[1] • Security • Aggregator can onlydecrypt sum • No shared secret, no pairwise distributed computations Aggregator User 1 Knows User i Computes … User n [1] E Shi et al. Privacy-Preserving Aggregation of Time-Series Data. NDSS, 2011

  13. Aggregator – Gaussian Approximation • Entities contribute • Enc[a1], Enc[a12], …, Enc[ai], Enc[ai2] • Broker aggregates to compute mean μ, and variance σ2 • Obtains Gaussian approximation N(μ, σ2) for each attribute pdf N(μ, σ2) age

  14. Aggregator - Attribute Ranking pdf • Assumption • Attributes with uniform distribution reveal less information about individual entities • Measure divergence • Distance between two probability distributions • Jenson-Shannon (JS) divergence • Small JS distance means low value pdf Uniform distribution

  15. Performance • Dataset and implementation • 100,000 real users from U.S. Census [data.gov, July 2013] • 3 types of attributes (income, education, age) • Java, measurements on Core i5 2.53 GHz, 8 GB RAM • Metrics • Accuracy of Gaussian approximation • Information leakage for each attribute • Revenue • Overhead

  16. Income Education Age 100 users 1,000 users 100,000 users

  17. Gaussian Approximations • Accuracy improves quickly with number of users (100 is good) • Fit for income and age is 3x better than for education

  18. Information Leakage vs Uniform • Maximum information leakage achieved at about 1,000 users • Information leakage not necessarily increasing with number of users (stable after a while) • Larger user samples do not necessarily provide better discriminating features

  19. Revenue Model • Value of user information: from $0.0005[2] to $33[1] • Where w=0.1 is the commission. [1] J. P. Carrascal, C. Riederer, V. Erramilli, M. Cherubini, and R. de Oliveira. Your browsing behavior for a big mac: Economics of personal information online. WWW,2013 [2] L. Olejnik, T. Minh-Dung, C. Castelluccia. Selling off privacy at auction. NDSS, 2014

  20. Revenue per Attribute • Three privacy sensitivity distributions • User revenue is small and does not increase with the number of participants • Revenue similar to Amazon Mechanical Turk • Broker incentivized to collect as many users as possible ($0.07  $ 2897) • Third parties incentivized to select demographic group of size 100

  21. Overhead User Aggregator • 1.5 min for 100 users • 27.7 h for 100,000 users • Can and should be parallelized 1 mstotal Independent of number of users

  22. Related Work • Privacy-preserving aggregation • Modified version of the Paillier encryption scheme[1,2] • But P2P communications between participants • Homomorphicencryption and differential privacy[3,4] • But differential privacy by third party and contributions linkable to users before aggregation [1] Z. Erkin and G. Tsudik. Private computation of spatial and temporal power consumption with smart meter.ACNS 2012 [2] E. Shi, R. Zhang, Y. Liu, and Y. Zhang. Prisense: privacy-preserving data aggregation in people-centric urban sensing systems. INFOCOM, 2010 [3] R. Chen, I. E. Akkus, and P. Francis. Splitx: high-performance private analytics. SIGCOMM, 2013 [4] R. Chen, A. Reznichenko, P. Francis, and J. Gehrke. Towards statistical queries over distributed private user data. NSDI, 2012

  23. Related Work • Privacy-preserving monetization • Local user profile generation, categorization, and ad selection[1,2] • Anonymizingproxies to shield users’ behavioral data from third parties[3] [1] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas. Adnostic: Privacy preserving targeted advertising. NDSS, 2010 [2] S.Guha, B.Cheng, and P. Francis. Privad: practical privacy in online advertising. NSDI, 2011 [3] C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez. For sale: your data: by: you. HotNETs, 2011

  24. Conclusion • Designed method to monetize sensitive data with privacy • If data is new currency, we are creating marketplace • Evaluation shows practical performance, good accuracy with as little as 100 users and good incentives for parties involved • Future work • Enhance security features (range checks to thwart pollution attacks, fault-tolerance, efficient key establishment) • Enable targeting of users after aggregation • Enable subsequent collection of more than model (i.e., black swan)

More Related