410 likes | 560 Views
Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions. Benny Pinkas HP Labs, Israel. Secure two-party computation - definition. y. x. Input:. F(x,y) and nothing else. Output:. y. As if…. x. F(x,y). F(x,y). Secure Function Evaluation.
E N D
Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs, Israel 10th Estonian Winter School in Computer Science
Secure two-party computation - definition y x Input: F(x,y) and nothing else Output: y As if… x F(x,y) F(x,y) 10th Estonian Winter School in Computer Science
Secure Function Evaluation • A major topic of cryptographic research • How to let n parties, P1,..,Pncompute a function F(x1,..,xn) • Where input xiis known to party Pi • Parties learn the final input and nothing else • Caveat: cryptographic definitions of secure computation are both too strong and too weak: • Too strong: do not allow leakage of harmless information; the price of this extra security is in efficiency. • Too weak: do not address leakage or misuse caused by the function itself (e.g., information implied by the outputs, or misbehavior in choosing an input). 10th Estonian Winter School in Computer Science
Secure Function Evaluation • Major Result [Yao]: “Any function that can be evaluated using polynomial resources can be securely evaluated using polynomial resources”(under some cryptographic assumption) 10th Estonian Winter School in Computer Science
Learns nothing Yj SFE Building Block: 1-out-of 2 Oblivious Transfer Y0, Y1 j{0,1} Bob Alice • 1-out-of-2 OT can be based on most public key systems • There are implementations with two communicationrounds 10th Estonian Winter School in Computer Science
General Two party Computation Two party protocol • Input: • Sender: Function F (some representation) • The sender’s input Y is already embedded in F • Receiver:X 0,1n • Output: • Receiver:F(x)and nothing else about F • Sender: nothing aboutx 10th Estonian Winter School in Computer Science
Representations of F • Boolean circuits [Yao,GMW,…] • Algebraic circuits [BGW,…] • Low deg polynomials [BFKR] • Matrices product over a large field [FKN,IK] • Randomizing polynomials [IK] • Communication Complexity Protocol [NN] 10th Estonian Winter School in Computer Science
Secure two-party computation of general functions [Yao] • First, represent the function F as a Boolean circuit C • It’s always possible • Sometimes it’s easy (additions, comparisons) • Sometimes the result is inefficient (e.g. for indirect addressing, e.g. A[x] ) • Then, “garble” the circuit • Finally, evaluate the garbled circuit 10th Estonian Winter School in Computer Science
wk0,wk1 G wi0,wi1 wJ0,wJ1 Garbling the circuit • Bob constructs the circuit, and then garbles it. W values will serve as cryptographic keys Wk0 0 on wire k Wk1 1 on wire k (Alice will learn one string per wire, but not which bit it corresponds to.) 10th Estonian Winter School in Computer Science
Gate tables • For every gate, every combination of input values is used as a key for encrypting the corresponding output • Assume G=AND. Bob constructs a table: • Encryption of wk0 using keys wi0,wJ0 (AND(0,0)=0) • Encryption of wk0 using keys wi0,wJ1 (AND(0,1)=0) • Encryption of wk0using keys wi1,wJ0 (AND(1,0)=0) • Encryption of wk1 using keys wi1,wJ1 (AND(1,1)=1) • Result: given wix,wJy, can compute wkG(x,y) 10th Estonian Winter School in Computer Science
wk0,wk1 G wi0,wi1 wJ0,wJ1 Secure computation • Bob sends the table of gate G to Alice • Given, e.g., wi0,wJ1, Alice computes wk0 by decrypting the corresponding entry in the table, but she does not know the actual values of the wires. • Encryption of wk0 using keys wi0,wJ0 • Encryption of wk0 using keys wi0,wJ1 • Encryption of wk1 using keys wi1,wJ1 • Encryption of wk0using keys wi1,wJ0 Permuted order 10th Estonian Winter School in Computer Science
Secure computation • Bob sends to Alice • Tables encoding each circuit gate. • Garbled values (w’s) of his input values. • Translation from garbled values of output wires to actual 0/1 values. • If Alice gets garbled values (w’s) of her input values, she can compute the output of the circuit, and nothing else. 10th Estonian Winter School in Computer Science
Alice’s input • For every wire i of Alice’s input: • The parties run an OT protocol • Alice’s input is her input bit (s). • Bob’s input is wi0,wi1 • Alice learns wis • The OTs for all input wires can be run in parallel. • Afterwards Alice can compute the circuit by herself. 10th Estonian Winter School in Computer Science
Secure computation – the big picture • Represent the function as a circuit C • Bob sends to Alice 4|C| encryptions (e.g. 64|C| Bytes), 4 encryptions for every gate. • Alice performs an OT for every input bit. (Can do, e.g. 100-1000 OTs per sec.) • ~One round of communication. • Efficient for medium size circuits! 10th Estonian Winter School in Computer Science
Example • The Millionaires problem: comparing two N bit numbers • What’s the overhead? 10th Estonian Winter School in Computer Science
Applications • Two parties. Two large data sets. • Max? • Mean? • Median? • Intersection? • Decision Tree learning? ID3? 10th Estonian Winter School in Computer Science
Fairplay – a secure two-party computation systemMalkhi, Nissan, P., Sella • A a full fledged secure two-party computation system, implementing Yao’s “garbled circuit” protocol. • Goals: • Investigate whether two-party SFE is practical • Actual measurements of overall computation • Breakdown of computation into parts • Computation versus communication? • Test-bed for various optimizations 10th Estonian Winter School in Computer Science
Fairplay • The Compilation paradigm • Programs written in SFDL, a high-level programming language • Allows clear, formal, easily understandable definition and requirements by humans • SHDL: Low-level language describing Boolean circuits • SFDL SHDL compiler and optimizer • SHDL Java programs implementing Yao’s protocol 10th Estonian Winter School in Computer Science
Fairplay – SFDL example program Millionaires { type int = Int<20>; // 20-bit integer type AliceInput = int; type BobInput = int; type AliceOutput = Boolean; type BobOutput = Boolean; type Output = struct {AliceOutput alice, BobOutput bob}; type Input = struct {AliceInput alice, BobInput bob}; function Output output(Input input) { output.alice = input.alice > input.bob; output.bob = input.bob > input.alice; } 10th Estonian Winter School in Computer Science
SFDL properties • Conventional syntax (C/Pascal-like) • Type system – Boolean, integer, enumerated • Program structure • Declarations: global constants, types • Sequence of functions (no nesting [C], no recursion) • Function name is its return value [Pascal] • Conditional execution and loops • if-then, if-then-else statements, For-loop (loop boundaries should be known at compile time) • Assignments and expressions • constants, variables, array entries, structure items, function calls, operators (+, -, logical, comparison), parenthesis 10th Estonian Winter School in Computer Science
SHDL example 0 input //output$input.bob$0 1 input //output$input.bob$1 2 input //output$input.bob$2 3 input //output$input.bob$3 4 input //output$input.alice$0 5 input //output$input.alice$1 6 input //output$input.alice$2 7 input //output$input.alice$3 8 gate arity 2 table [ 1 0 0 0 ] inputs [ 4 5 ] 9 gate arity 2 table [ 0 1 1 0 ] inputs [ 4 5 ] 10th Estonian Winter School in Computer Science
kth-ranked element (e.g. median) • Inputs: • Alice: SA Bob: SB • Large sets of unique items (D). • Output: • x SA SBs.t. x has k-1 elements smaller than it. • The rank k • Could depend on the size of input datasets. • Median: k = (|SA| + |SB|) / 2 • Motivation: • Basic statistical analysis of distributed data. • E.g. histogram of salaries in CS departments • The Problem: Generic constructions using circuits [Yao …] yield an overhead which is at least linear in k. 10th Estonian Winter School in Computer Science
An (insecure) two-party median protocol SA LA mA RA mA < mB SB LB mB RB LA lies below the median, RB lies above the median. New median is same as original median. Recursion Need log n rounds (assume each set contains n=2i items) 10th Estonian Winter School in Computer Science
A Secure two-party median protocol A deletes elements≤ mA. B deletes elements > mB. YES A finds its median mA B finds its median mB mA<mB A deletes elements > mA. B deletes elements ≤ mB. NO Secure comparison (e.g. a small circuit) 10th Estonian Winter School in Computer Science
1 16 16 1 1 8 9 16 An example B A mA>mB mA<mB mA<mB mA>mB Median found!! mA<mB 10th Estonian Winter School in Computer Science
Proof of security median B A mA>mB mA>mB mA<mB mA<mB mA<mB mA<mB mA>mB mA>mB mA<mB mA<mB 10th Estonian Winter School in Computer Science
- 2i + + Arbitrary input size, arbitrary k SA k SB Now, compute the median of two sets of size k. Size should be a power of 2. median of new inputs = kth element of original inputs 10th Estonian Winter School in Computer Science
+ - S + - Hiding size of inputs • Can search for kthelement without revealing size of input sets. • However, k=n/2 (median) reveals input size. • Solution: Let S=2i be a bound on input size. Median of new datasets is same as median of original datasets. |SA| |SB| 10th Estonian Winter School in Computer Science
Huge Privacy preserving data mining P2 P1 Confidential databaseD1 Confidential databaseD2 Wish to “mine” D1 D2without revealing more info • Examples: • Medical databases protected by law • Competing businesses • Government agencies (privacy, “need to know”) 10th Estonian Winter School in Computer Science
The classification problem Goal: based on available data design an algorithm to classify new data 10th Estonian Winter School in Computer Science
ID3: Choose attribute A that minimizes the conditional entropy of the attribute class Time insured [0,9] years [10,19] years > 20 years Age > 30 Claim > $500 No No Yes No Yes Yes Yes No No Classification using Decision Trees 10th Estonian Winter School in Computer Science
Privacy Preserving ID3 • Scenario: The inputs are private information of P1 and P2 • Main technical problem: Comparing entropies while preserving privacy.(entropy = x logx) • Efficiency: • most computation done independently by parties. • The overhead of cryptographic operations depends only on the size of the decision tree (not on the input size). • Basic task: compute x log x. x = x1+x2 = e.g., totalnumber of customers with (age > 30) and (fraud = yes) 10th Estonian Winter School in Computer Science
Privacy Preserving ID3 • Computing x log x: • x =x1+ x2, known to P1 and P2 respectively (independently computed from databases). • Might as well compute x lnx, or lnx. • First run a protocol to compute random shares, y1+ y2= ln x • ln x is Real. Crypto works over finite fields. Must do numerical analysis. 10th Estonian Winter School in Computer Science
Cryptographic Tools • Secure Function Evaluation (SFE) [Yao] • Oblivious Polynomial Evaluation [NP] A polynomial Q(·) x Input: Q(x) and nothing else nothing Output: Implementation: Two passes, O(degree) (or O( log|F|) ) exponentiations. 10th Estonian Winter School in Computer Science
Computing random shares of lnx = ln(x1+x2) Use Taylor approximation for lnx • x = x1 +x2= 2 n (1+) -½< < ½ • lnx = ln(2 n (1+)) = ln 2 n + ln(1+) ln 2 n + i=1..k(-1) i-1 i / i = ln 2 n + T() • T()is a polynomial of degree k. Error is exponentially small in k. • We only know how to work over finite fields • Compute c·lnx, where c compensates for fractions. • Work in F, where |F| sufficiently large. 10th Estonian Winter School in Computer Science
ln(x1+x2) Protocol • Step 1 of the protocol – Find n, • Apply Yao’s protocol to the following small circuit • Input: x1andx2 • Output (random shares): • randoma1 and a2 s.t. a1 + a2 = x-2 n = ·2 n • randomb1 and b2 s.t. b1 + b2= ln 2 n • Operation: The protocol finds 2 n closest to x1+x2, computes 2 n = x1+x2- 2 n. • x =x1 +x2 = 2 n + 2 n • lnx = ln(2 n (1+)) = ln 2 n + ln(1+) 10th Estonian Winter School in Computer Science
ln(x1+x2) Protocol (Cont.) Step 2 of the protocol • Compute random shares of T() (Taylor approx.) • P1 chooses a randomw1 F and defines a polynomial Q(x), s.t. w1+Q(a2) = T() (recall a1 + a2 = ·2 n) • Namely,Q(x) = T( (a1+x)/2 n) – w1. • Run an oblivious poly evaluation in which P2computes • w2= Q(a2) = T() – w1. • Now the parties have randomw1 and w2 s.t. • w1 + w2 = T() ln(1+) • (b1 + w1) + (b2+ w2) ln 2 n + ln(1+) = ln x 10th Estonian Winter School in Computer Science
The rest of the work.. • The parties compute shares of lnx • Then they compute shares of xlnx • Each party computes a share of the entropy by summing shares of x lnx (H(X) = x lnx ) • A small circuit finds the attribute giving the minimal conditional entropy • The attribute is assigned to the node • The databases are divided according to the value of this attribute 10th Estonian Winter School in Computer Science
Efficiency • lnx protocol: • secure computation of a small circuit • one oblivious polynomial evaluation • ID3 for a database with: • 1,000,000 transactions • 15 attributes • 10 values per attribute • 4 class values • Communication per node takes seconds (T1) • Computation per node takes minutes (P3) 10th Estonian Winter School in Computer Science
Contributions • Cryptographic protocols where the bulk of the operations is done independently. • Data mining • Rigorous model for secure data-mining. • Efficient, secure protocol for specific problems (median, ID3). • Cryptography • Sub-linear complexity - secure computation for large data sets. • Efficient protocols for complex known algorithms. • Secure computation of logarithms(real function - numerical analysis). • Drawbacks: • Privacy preserving solutions are less efficient • It’s hard to find efficient private solutions for all interesting functions • Security against malicious parties 10th Estonian Winter School in Computer Science
References • Lecture notes and overview papers: • B. Pinkas, Cryptographic Techniques for Privacy-Preserving Data Mining, SIGKDD Explorations, January 2003. http://www.pinkas.net/PAPERS/sigkdd.pdf • R. Cramer: Introduction to Secure Computation, 2000. http://homepages.cwi.nl/~cramer/papers/CRAMER_revised.ps • Ivan Damgård,Theory and practice of multiparty computation, 8th EWSCS, http://www.cs.ioc.ee/yik/schools/win2003/damgard.php • Research papers: • G. Aggarwal, N. Mishra and B. Pinkas, Secure Computation of the K'th-ranked Element, Eurocrypt '2004. http://www.pinkas.net/PAPERS/ANP04.pdf • Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Journal of Cryptology, Vol. 15 – No. 3, 2002. http://www.pinkas.net/PAPERS/id3-final.pdf 10th Estonian Winter School in Computer Science