610 likes | 778 Views
Secure Multiparty Computation and Privacy. Yehuda Lindell Bar-Ilan University. Motivation. Huge databases exist in society today Medical data Consumer purchase data Census data Communication and media-related data Data gathered by government agencies Can this data be utilized?
E N D
Secure Multiparty Computation and Privacy Yehuda Lindell Bar-Ilan University
Motivation • Huge databases exist in society today • Medical data • Consumer purchase data • Census data • Communication and media-related data • Data gathered by government agencies • Can this data be utilized? • For medical research • For improving customer service • For homeland security
Motivation • Data sharing is necessary for full utilization: • Pooling medical data can improve the quality of medical research • Pooling of information from different government agencies can provide a wider picture • What is the health status of citizens that are supported by social welfare? • Are there citizens that receive simultaneous support from different agencies? • Data gathered by the government (e.g., census data) should be publicly available
The Problem • The huge amount of data available means that it is possible to learn a lot of information about individuals from public data • Purchasing patterns • Family history • Medical data • And much much more
Privacy Breaches • Latanya Sweeney – testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security: • One problem is that people don’t understand what makes data unique or identifiable. For example, in 1997 I was able to show how medical information that had all explicit identifiers, such as name, address and Social Security number removed could be re-identified using publicly available population registers (e.g., a voter list). In this particular example, I was able to show how the medical record of William Weld, the governor of Massachusetts of the time could be re-identified using only his date of birth, gender and ZIP. In fact, 87% of the population of the United States is uniquely identified by date of birth (e.g., month, day and year), gender, and their 5-digit ZIP codes. The point is that data that may look anonymous is not necessarily anonymous.
Motivation – Homeland Security • Many different security agencies coexist • These agencies are hesitant to share information • Sometimes this is legitimate (if all agencies share all information, a single mole can compromise all agencies) • Typically, they only share conclusions • The problem: more patterns could be found if if data and not just conclusions are shared
Motivation – Concrete Example • Investigation at Stillwater State Correctional Facility, Minnesota • Data mining software was applied to phone records from the prison • A pattern linking calls between prisoners and a recent parolee was discovered • The calling data was then mined again together with records of prisoners’ financial accounts • The result: a large drug smuggling ring was discovered
Data Utilization or Privacy • In many cases, due to public outcry, important data mining projects have been halted • The Canadian big-brother database • The total awareness project • It may be good that these projects were scrapped in their current form, but it would be better if we could have our cake and eat it too…
Privacy • The human definition: • Privacy and autonomy: information about us that we feel is personal, confidential or private should not be unnecessarily distributed or publicly known • Privacy and control: Our personal or private information should not be misused (whatever that means) • How can we mathematically formulate this? • The same information is classified differently by different people • Legitimate use is interpreted differently by different people
Turing’s Concern • No attempt has yet been made to show that the “computable” numbers include all numbers which would naturally be regarded as computable. All arguments which can be given are bound to be, fundamentally, appeals to intuition, and for this reason rather unsatisfactory mathematically. The real question at issue is “What are the possible processes which can be carried out in computing a number?” • The arguments which I shall use are of three kinds. • A direct appeal to intuition. • A proof of the equivalence of two definitions (in case the new definition has a greater intuitive appeal). • Giving examples of large classes of numbers which are computable. • Once it is granted that computable numbers are all “computable” several other propositions of the same character follow. In particular, it follows that, if there is a general process for determining whether a formula of the Hilbert function calculus is provable, then the determination can be carried out by a machine.
Secure Computation and Privacy • Secure computation: • Assume that there is a function that all parties wish to compute • Secure computation shows how to compute that function in the safest way possible • In particular, it guarantees minimal information leakage (the output only) • Privacy: • Does the function output itself reveal “sensitive information”, or • Should the parties agree to compute this function?
This Talk • We focus on the problem of secure multiparty computation • Definitional paradigms (the focus) • Feasibility results • Efficiency concerns • We will also discuss the limitations of the approach for solving problems of privacy
Secure Multiparty Computation • A set of parties with private inputs • Parties wish to jointly compute a function of their inputs so that certain security properties (like privacy and correctness) are preserved • E.g., secure elections, auctions… • Properties must be ensured even if some of the parties maliciously attack the protocol
Secure Computation Tasks • Examples: • Authentication protocols • Online payments • Auctions • Elections • Privacy preserving data mining • Essentially any task…
Security Requirements • Consider a secure auction (with secret bids): • An adversary may wish to learn the bids of all parties – to prevent this, require privacy • An adversary may wish to win with a lower bid than the highest – to prevent this, require correctness • But, the adversary may also wish to ensure that it always gives the highest bid – to prevent this, require independence of inputs
Defining Security • Option 1: analyze security concerns for each specific problem • Auctions: as in previous slide • Elections: privacy and correctness only • But then, maybe can condition vote for a party based on whether it will get the minimum number of votes • Problems: • How do we know that all concerns are covered? • Definitions are application dependent and need to be redefined from scratch for each task
Defining Security – Option 2 • The real/ideal model paradigm for defining security [GMW,GL,Be,MR,Ca]: • Ideal model: parties send inputs to a trusted party, who computes the function for them • Real model: parties run a real protocol with no trusted help • A protocol is secure if any attack on a real protocol can be carried out in the ideal model • Since no attacks can be carried out in the ideal model, security is implied
The Real Model x y Protocol output Protocol output
The Ideal Model x y y x f1(x,y) f2(x,y) f2(x,y) f1(x,y)
there exists an adversary S For every real adversary A The Security Definition: Protocol interaction Trusted party REAL IDEAL
Properties of the Definition • Privacy: • The ideal-model adversary cannot learn more about the honest party’s input than what is revealed by the function output • Thus, the same is true of the real-model adversary • Correctness: • In the ideal model, the function is always computed correctly • Thus, the same is true in the real-model • Others: • For example, independence of inputs
Why This Approach? • General – it captures all applications • The specifics of an application are defined by its functionality, security is defined as above • The security guarantees achieved are easily understood (because the ideal model is easily understood) • We can be confident that we did not “miss” any security requirements
More Details on the Definition • The Adversary • Computational power: • Probabilistic polynomial-time versus all-powerful • Adversarial deviation • Semi-honest: follows protocol instructions • Malicious: arbitrary actions • Corruption behaviour • Static: set of corrupted parties fixed at onset • Adaptive: can choose to corrupt parties at any time during computation • Number of corruptions • Honest majority versus unlimited corruptions
More Details on the Definition • The Network • Scheduling • Synchronous: messages sent in rounds • Asynchronous: messages can be delayed arbitrarily (adversary controls message delivery) • Semi-synchronous: adversary controls scheduling, but only to a limit (can be achieved using local clocks) • Communication channels • Authenticated: adversary can see, but cannot modify messages sent between the parties
The Ideal Model – More Details • The Trusted Party: • Defined by any probabilistic polynomial-time Turing machine – this machine defines the functionality • Trusted party linked to all participants via perfectly private and authenticated channels • Upon receiving an input from a party, the trusted party runs the machine • If there is an output, it sends it to the designated party • Continue as above • This is more general than secure function evaluation
Examples • Elections: • Trusted party receives votes. After all have been received (or after a certain time), it computes the election results and sends them out to all • Poker: • The trusted party deals the cards randomly (sending 5 cards to each party), and remembers which cards were dealed • Parties decide which cards to throw and send these cards to the trusted party • The trusted party checks that the cards are correct and re-deals the correct number of new cards • The trusted party publicizes the cards of a party who opens his hand
The Ideal Model – More Details • The definition we gave suffices in the case of an honest majority • When there is no honest majority • Guaranteed output delivery cannot be achieved • Fairness cannot be achieved • Changes to ideal model: • Corrupted parties receive output first • Adversary decides if honest parties receive their outputs as well • This is called security with abort
Adding Defects to the Ideal Model • In the case of no honest majority, fairness and guaranteed output delivery cannot be achieved • This “defect” is included into the ideal model • This approach can be used to also model partial information leakage: • The parties wish to compute a function f, but more information is leaked by the protocol • This can be modeled by having the trusted party explicitly leak this information • Helps for efficiency considerations • Advantage: explicit defect!
More on Definitions • There are numerous ways to define the real model, regarding both the adversary and network • The main thing is to realistically (and conservatively) model the real world scenario and adversarial threats • An overly conservative approach may be detrimental in that it precludes efficient protocols
Feasibility – A Fundamental Theorem • Any multiparty functionality can be securely computed • For any number of corrupted parties: security with abort is achieved, assuming enhanced trapdoor permutations [Yao,GMW] • With an honest majority: full security is achieved, assume private channels only [BGW,CCD] • These results hold in the stand-alone model • When composition is considered, things are much more difficult
Application to Private Data Mining • The setting: • Data is distributed at different sites • These sites may be third parties (e.g., hospitals, government bodies) or may be the individual him or herself • The aim: • Compute the data mining algorithm on the data so that nothing but the output is learned • That is, carry out a secure computation • Conclusion: private data mining is solved in principle
Privacy Security • As we have mentioned, secure computation only deals with the process of computing the function • It does not ask whether or not the function should be computed
Privacy and Secure Computation • Secure computation can be used to solve any distributed data-mining problem • A two-stage process: • Decide that the function/algorithm should be computed – an issue of privacy • Apply secure computation techniques to compute it securely – security • But, not every privacy problem can be cast as a distributed computation
Census Bureau and Privacy • Case study – the census bureau: • The census bureau releases its results so that they can be studied • Question: • How does it make sure that the results released do not compromise privacy? • Answer: • The tables are manipulated to hopefully protect the privacy of individuals • Methods are highly involved (statistics), but are not “cryptographic”
Census Bureau and Privacy • Suggestion: • If someone wants to study the results (ask a statistical query), let them run a secure computation with the census bureau • Secure computation is necessary so that parties don’t have to reveal to the census bureau what they are studying
Census Bureau and Privacy • Naïve objection: • This would be far too expensive and is not realistic in practice • Better objection: • This contradicts government transparency • The census bureau can fudge results based on political interests, without possibly being detected (regular census tables can be compared against other sources to check for accuracy)
Census Bureau and Privacy • An even better objection: • Who says that privacy is preserved in this way? • If the census bureau doesn’t know the query (because it’s a secure computation), then it is easy to ask queries that overlap and reveal confidential information about an individual
Conclusion – Census Bureau • Secure multiparty computation doesn’t solve this problem • Because sometimes in reality, actual data must be revealed, and not just the results of a function evaluation • In other words, this problem cannot be cast as a distributed computation • Casting it as a distributed computation introduces other problems and can even harm privacy
Secure Computation and Privacy • The census bureau case is different – it is not a distributed computation problem • What about secure computation for distributed problems?
Privacy Difficulties • Crucial point: • In order to analyze the privacy of a solution, it is crucial to understand why we want privacy to start with • We demonstrate this with two examples: • Personalized newspapers • Personalized shopping catalogs
Personalized Newspapers • The aim: • Present a newspaper with a layout that is personalized to a reader’s interest • The problem: • We don’t want the newspaper to necessarily know what we are interested in • Our political opinions may be private • Our interest in certain stocks can be confidential • Or just because our interests are our own
Personalized Newspapers • The non-private solution: • Input: • User input: answers to an “interest questionnaire” and possible ratings for articles read • Automated input: the newspaper gathers information about which articles were read by the user, for how long and so on (appropriate for online newspapers) • The computation: data mining algorithms are run to determine what is of interest to the user and in which layout to present it
Personalized Newspapers • The solution – secure computation • The reader inputs its personal data • The newspaper inputs the articles and the “rules” to define the layout based on the reader’s data • The reader receives the personalized newspaper, the newspaper learns nothing • Caveat: • Successful data mining here would consider all readers together, which is not practical
Personalized Newspapers • Privacy is clearly preserved • The newspaper learns nothing about the reader’s data (interests, preferences, reading history etc.) • There is no dilemma here even regarding computing the function • The newspaper learns nothing so the function can clearly be computed without compromising privacy
Personalized Newspaper Danger • Why do we want privacy regarding our interests and why is this an important question? • Typical answer to why we want privacy • A personal feeling of discomfort • A more concrete answer • A danger to our autonomy: if the newspaper knows our interests, political leanings etc., it could feed us slanted information • Our interests could be used against us
Personalized Newspaper Danger • The solution based on secure computation does not solve the problem at all • The newspaper can set rules that say: if the reader has political view X, then present this article… • The secure-computation solution provides full privacy of information • But, the danger to autonomy comes from being able to act upon private information, and this was not prevented
Personalized Newspaper Danger • The reason that we want privacy is crucial • The secure computation solution is trivially private, but doesn’t solve the problem at all • It can be fixed – but we need to have a good understanding of what the problem is before we can design a solution
Personalized Newspaper Solution • The rules used for defining the layout etc. should be either: • Defined by the newspaper but be public • Defined by the users themselves • Conclusions: • This is obvious – but only once you understand why you actually want privacy • Good, rigorous definitions can be formed, but they take time and care
Personalized Shopping Catalogs • The same aim: • Present customers with catalogs containing products that they are interested in • Give certain customers “deals” • The same problem: • We don’t want the store to know all of our shopping history • The same solution: secure computation
Personalized Shopping Catalogs • A new problem: price discrimination • The store can charge much higher prices to customers who are known to not “shop around” • The secure computation solution once again does not solve the problem