290 likes | 390 Views
Private Analysis of Data Sets. Benny Pinkas HP Labs, Princeton. A story. We’re experiencing a lot of fraud lately…. Here too. I can’t find a pattern to recognize fraud in advance. Neither can I. But, what about Patients’ privacy Business secrets. Maybe we should share information.
E N D
Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton
A story We’re experiencing a lot of fraud lately… Here too.. I can’t find a pattern to recognize fraud in advance.. Neither can I.. • But, what about • Patients’ privacy • Business secrets Maybe we should share information.. Have you heard of “Secure function evaluation” ? This is all “theory”. It can’t be efficient.
New Opportunities for Interaction Between • Enterprises, and government agencies holding sensitive data. • P2P users • Mobile wireless crowds (PDAs, cell phones) • What about privacy? • A bidirectional approach: • Finding what is actually needed • Designing useful and efficient cryptographic tools
Cryptographic Protocols for Privacy Preserving Computation y x Input: F(x,y) and nothing else Output: y As if… x F(x,y) F(x,y)
Does the trusted party scenario make sense? y x F(x,y) F(x,y) • We cannot hope for more privacy • Does the trusted party scenario make sense? • Are the parties motivated to submit their true inputs? • Can they tolerate the disclosure of F(x,y)? • If so, we can implement the scenario without a trusted party.
y x Input: nothing C(x,y) and nothing else Output: Secure Function Evaluation [Yao,GMW,BGW] • F(x,y) – A public function. • Represented as a Boolean circuit C(x,y). • Implementation: • O(|X|) “oblivious transfers”. O(|C|) communication. • Pretty efficient for small circuits! (but what about • larger circuits?)
AND = = = x1 y1 x2 y2 xn yn An equality circuit 1 if x=y 0 otherwise = x y
Cryptographic methods Randomization methods [statistical disclosure, AS] Cryptographic methods vs. randomization methods overhead Our goal… inaccuracy lack of privacy
Examples of Simple Privacy Preserving Primitives (with reasonable solutions) • Is X = Y? Is X > Y? • What is X Y? What is median of X Y? • Auctions (negotiations). Many parties, private bids. Compute the winning bidder and the sale price, but nothing else. [NPS] • Voting • Add privacy to data mining algs (ID3 – [LP])
Private Set Intersection with Mike Freedman, NYU Kobbi Nissim, MSR
Applications of Set Intersection Government agency B Government agency A People on welfare Expensive car buyers Compute intersection and nothing else
Computing the Intersection • Private Equality Test (PET) • Alice: x. Bob: y. • Output: 1 iff x=y • Privacy preserving solutions: • Cannot use hash functions alone • Yao, [FNW], [NP] • Generalization: list intersection • X = x1, …, xn Y = y1, …, yn
The basic tool: Homomorphic Encryption • Semantically secure public key encryption • Given Enc(M1), ENC(M2), can compute (without knowing the decryption key) • Enc(M1+M2) • Enc(c· M1) for any constant c. • I.e. Enc(a0)+Enc(a1)x+…+Enc(an)xn = Enc(P(x)) • Examples: El Gamal, Paillier, DJ.
The Scenario • Client: X = x1, …, xn • Server: Y = y1, …, yn • Output: • Client learns X Y. • Server learns nothing.
The Protocol • Client defines a polynomial of degree n whose roots are x1,…,xn • P(y) = (x1-y)·(x2-y)·…·(xn-y) = anyn + … + a1y + a0 • Sends to server homomorphic encryptions of coefficients • Enc(an),…, Enc(a0) • (only the client can decrypt)
…The Protocol • Server uses homomorphic properties to compute yEnc( r·P(y) + y) (r is random) • If yXY result is Enc(r·0+y)=Enc(y), otherwise result is Enc(random). • Server sends (permuted) results to C. • C decrypts, compares to its list.
Security • Bad server? The server only sees semantically secure encryptions. Learning about C’s input = breaking enc. • Bad client? The client can, given only the output XY,simulate her “view” in the protocol. (I.e. she generates encryptions of items in XY, and of random items.)
Efficiency • Client encrypts and decrypts n values • Communication is O(n) • Server: • For each input computes Enc(r·P(y)+y), i.e. n exponentiations. • Total O(n2) exponentiations • Can use hashing to reduce overhead to O(n lnln n).
Is Approximation easier? • Can we approximate size of intersection (i.e. scalar product) with sublinear overhead? • Lower bound: • Approximating |XY| within 1 ε factor requires Ω(n) communication (constant ε). • True even for randomized algorithms. • Proof: reduction to Razborov’s lower bound for Disjointness. • Upper bound: protocols with matching overhead.
Secure Computation of the Kth-ranked element with Gagan Aggarwal, Stanford Nina Mishra, HPL
Secure Computation of the Kth-ranked element • Inputs: • A: SA B: SB • Large sets of unique items (D). • There’s also the multi-party scenario • Output: x SA SB s.t. |{y | y<x, ySASB}| = k-1 • Median: k = (|SA| + |SB|) / 2
Motivation • Basic statistical analysis of distributed data • E.g. histogram of salaries in competing business in the same area • Sometimes the parties might want to hide the size of their inputs
Some information is always revealed • The Kth-ranked elementreveals some information • Suppose SA = x1,…,x1000 • Median of SA SB = x400 • Party A now learns that SB contains at least 200 elements smaller than x400 • But she shouldn’t learn more
Results, and previous work • Previous work: generic constructions – overhead at least linear in k. • New results: • Two-party: log k secure comparisons of log D bit numbers. • Multi-party: log D simple computations with log D bit numbers.
An (insecure) two-party median protocol SA LA mA RA mA < mB SB LB mB RB LA lies below the median, RB lies above the median. New median is same as original median. Recursion Need log n rounds (suppose each set contains 2i items)
Secure two-party median protocol A deletes xєSA s.t. x < mA. B deletes xєSB s.t. x > mB. YES A finds median of SA, call it mA B finds median of SB, call it mB mA<mB A deletes xєSA s.t. x > mA. B deletes xєSB s.t. x < mB. NO Secure comparison (e.g. a small circuit)
Proof of security • Simulation: Given the protocol’s output, each party can simulate the execution of the protocol SA median First comparison: mA<mB Second comparison: mA>mB
+ - + + Arbitrary inputs, arbitrary k SA K 2i SB Now, compute the median of two sets of size k Size should be a power of 2 median of new inputs = kth element of original inputs
Conclusions • Efficient privacy preserving primitives for basic tasks • Open problems • Intersection: approximate matching? • Median: clustering? • Theory and applications can and should interact • Tools from the theory of cryptography (e.g. SFE) can be used in applications • Applications can benefit from rigorous analysis • There’s a lot more to be done…