390 likes | 567 Views
Privacy in Today’s World: Solutions and Challenges. Rebecca Wright Stevens Institute of Technology 26 June, 2003. Talk Outline. Overview of privacy landscape Privacy-preserving data mining Privacy-protecting statistical analysis of large databases Selective private function evaluation
E N D
Privacy in Today’s World: Solutions and Challenges Rebecca Wright Stevens Institute of Technology 26 June, 2003
Talk Outline • Overview of privacy landscape • Privacy-preserving data mining • Privacy-protecting statistical analysis of large databases • Selective private function evaluation • Conclusions
Erosion of Privacy “You have zero privacy. Get over it.” - Scott McNealy, 1999 • Changes in technology are making privacy harder. • reduced cost for data storage • increased ability to process lots of data • Increased need for security may make privacy seem less critical.
Historical Changes • Small towns, little movement: • very little privacy, social mechanisms helped prevent abuse • Large cities, increased movement: • lost social mechanisms, but gained privacy through anonymity • Now: • advancing technology is reducing privacy, social mechanisms not replaced.
What Can We Do? • Use technology, policy, and education to • maintain/increase privacy • provide new social mechanisms • create new mathematical models for better understanding Problem: Using old models and old modes of thought in dealing with situations arising from new technology.
What is Privacy? • May mean different things to different people • seclusion: the desire to be left alone • property: the desire to be paid for one’s data • autonomy: the ability to act freely • Generally: the ability to control the dissemination and use of one’s personal information.
Privacy of Data • Stored data • encryption, computer security, intrusion detection, etc. • Data in transit • encryption, network security, etc. • Release of data • current privacy-oriented work: P3P, privacy bird, EPA, Internet Explorer 6.0, etc.
Internet Explorer V.6 Fairly simple, deals only with cookies, limited info
Different Types of Data • Transaction data • created by interaction between stakeholder and enterprise • current privacy-oriented solutions useful • Authored data • created by stakeholder • digital rights management (DRM) useful • Sensor data • stakeholders not clear at time of creation • presents a real and growing privacy threat
Product Design as Policy Decision • product decisions by large companies or public organizations become de facto policy decisions • often such decisions are made without conscious thought to privacy impacts, and without public discussion • this is particularly true in the United States, where there is not much relevant legislation
Washington, DC - no record kept of per card transactions - damaged card can be replaced if printed value still visible New York City - transactions recorded by card ID - damaged card can be replaced if card ID still readable - have helped find suspects, corroborate alibis Example: Metro Cards
Privacy Tradeoffs? • Privacy vs. security: maybe, but doesn’t mean giving up one gets the other (who is this person? is this a dangerous person?) • Privacy vs. usability: reasonable defaults, easy and extensive customizations, visualization tools Tradeoffs are to cost or power, rather than inherent conflict with privacy.
Surveillance and Data Mining • Analyze large amounts of data from diverse sources. • Law enforcement and homeland security: • detect and thwart possible incidents before they occur • identify and prosecute criminals after incidents occur • Companies like to do this, too. • Marketing, personalized customer service
Privacy-Preserving Data Mining Allow multiple data holders to collaborate to compute important (e.g. security-related) information while protecting the privacy of other information. Particularly relevant now, with increasing focus on security even at the expense of privacy (e.g. TIA).
Advantages of privacy protection • protection of personal information • protection of proprietary or sensitive information • fosters collaboration between different agencies (since they may be more willing to collaborate if they need not reveal their information)
Cryptographic Approach • Using cryptography, provably does not reveal anything except output of computation. • Privacy-preserving computation of decision trees [LP00] • Secure computation of approximate Hamming distance of two large data sets [FIMNSW01] • Privacy-protecting statistical analysis [CIKRRW01] • Privacy-preserving association rule mining [KC02]
Randomization Approach • Randomizes data before computation (which can then either be distributed or centralized). • Induces a tradeoff between privacy and computation error. • Distribution reconstruction algorithm from randomized data [AS00] • Association rule mining [ESAG02]
Comparison of Approaches cryptographic approach inefficiency privacy loss randomization approach inaccuracy
Comparison of Approaches inefficiency cryptographic approach privacy loss randomization approach inaccuracy
Privacy-Protecting Statistics [CIKRRW01] CLIENT Wishes to compute statistics of servers’ data SERVERS Each holds large database • Parties communicate using cryptographic protocols designed so that: • Client learns desired statistics, but learns nothing else about data (including individual values or partial computations for each database) • Servers do not learn which fields are queried, or any information about other servers’ data • Computation and communication are very efficient
Non-Private and Inefficient Solutions • Database sends client entire database (violates database privacy) • For sample size m, use SPIR to learn m values (violates database privacy) • Client sends selections to database, database does computation (violates client privacy) • General secure multiparty computation (not efficient for large databases)
Secure Multiparty Computation • Allows k players to privately compute a function f of their inputs. • Overhead is polynomial in size of inputs and complexity of f [Yao, GMW, BGW, CCD, ...] P1 P2 Pk
Symmetric Private Information Retrieval • Allows client with input i to interact with database server with input x to learn (only) • Overhead is polylogarithmic in size of database x [CMS,GIKM] Client Server i Learns
Homomorphic Encryption • Certain computations on encrypted messages correspond to other computations on the cleartext messages. • For additive homomorphic encryption, • E(m1) • E(m2) = E (m1+ m2) • also impliesE(m)x = E(mx) • Paillier encryption is an example.
Privacy-Protecting Statistics Protocol • To learn mean and variance: enough to learn sum and sum of squares. • Server stores: ... ... and responds to queries from both • efficient protocol for sum efficient protocol for mean and variance
Weighted Sum Client wants to compute selected linear combination of m items: Client Server Homomorphic encryption E, D if computes o/w v decrypts to obtain
Efficiency • Linear communication and computation (feasible in many cases) • If nis large and m is small, would like to do better
Selective Private Function Evaluation • Allows client to privately compute a function f over m inputs • client learns only • server does not learn Unlike general secure multiparty computation, we want communication complexity to depend on m, not n. (More accurately, polynomial in m, polylogarithmic in n).
Security Properties • Correctness: If client and server follow the protocol, client’s output is correct. • Client privacy: malicious server does not learn client’s input selection. • Database privacy: • weak: malicious client learns no more than output of some m-input function g • strong: malicious client learns no more than output of specified function f
Solutions based on MPC • Input selection phase: • server obtains blinded version of each • Function evaluation phase • client and server use MPC to compute f on the m blinded items
Input selection phase Client Server Homomorphic encryption D,E Computes encrypted database Retrieves using SPIR ... SPIR(m,n), E Picks random computes Decrypts received values:
Function Evaluation Phase • Client has • Server has Use MPC to compute: • Total communication cost polylogarithmic in n, polynomial in m, | f |
Distributed Databases • Same approach works to compute function over a distributed database. • Input selection phase done in parallel with each database server • Function evaluation phase done as single MPC • Database privacy means only final outcome is revealed to client.
Performance Current experimentation to understand whether these methods are efficient in real-world settings.
Initial Experimental Results • Initial implementation of linear computation and communication solution [H. Subramaniam & Z. Yang] • implementation in Java and C++ • uses Paillier encryption • uses synthetic data, with client and server as separate processes on the same machine (2-year old Toshiba laptop).
Conclusions • Privacy is in danger, but some important progress has been made. • Important challenges ahead: • Usable privacy solutions (efficiency and user interface) • Sensor data • Better use of hybrid approach: decide what can safely be disclosed, what needs moderate protection, and use cryptographic protocols to protect most critical information. • Mathematical/formal models to understand and compare different solutions.
Research Directions • Investigate integration of cryptographic approach and randomization approach: • seek to maintain strong privacy and accuracy of cryptographic approach, ... • while benefitting from improved efficiency of randomization approach • Understand mathematically what the resulting privacy and/or accuracy compromises are. • Technology, policy, and education must work together.