260 likes | 440 Views
Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining. Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley. Goal.
E N D
Practical Private Computation and Zero-Knowledge Tools for Privacy-PreservingDistributed Data Mining Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley
Goal • To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
Goal • To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
The Scenario • Two data miners mine data from n users • The data miners are semi-honest: follow the protocol but try to get more info • Some fraction of users can be malicious: they may input bogus data to disrupt the computation • A more realistic adversary model than most existing privacy-preserving data mining schemes
Challenge: standard cryptographic tools not feasible at large scale u1 u2 un-1 un d1 d2 dn-1 dn Must be obfuscated Model f di in Zφm φ: < 32 or 64-bit ……
A Practical Solution • Provable privacy: Cryptography • Efficiency: • VSS over small field. • Minimize the number of expensive primitives and rely on probabilistic guarantee • Realistic adversary model: An extremely efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation
No leakage beyond final result for many algorithms or differential privacy [Dwork06] u1 u2 un-1 un d1 d2 dn-1 dn Cryptographic privacy Basic Approach Σ f = ……
The Power of Addition • A large number of popular algorithms can be run with addition-only steps • Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc • All algorithms in the statistical query model [Kearns 93] • Many other gradient-based numerical algorithms • A trick used a lot for parallelization in distributed computing [Chu 06, Das 07] • Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs
Private Addition • The computation: secret sharing over small field • Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector
Big Integers vs. Small Ones • Most applications work with “regular-sized” integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec) • Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec) • A 6 orders of magnitude difference!
Private Addition ui vi di: user i’sprivate vector. ui,,vi anddi are all in a small integer field ui + vi = di
Private Addition μ = Σui ν = Σvi ui + vi = di
Private Addition μ ν μ = Σui ν = Σvi ui + vi = di
Private Addition μ + ν
Private Addition • Provable privacy • Computation on each server is over small field: same cost as non-private implementation – O(m) small field operations • So the cost for privacy is only due to verification • For that we have a solution that involves only O(log m) large field operations
Bush 100,000Gore -100,000 The Need for Verification • Private computation obfuscates user data. A malicious user could input anything. • Think of a voting scheme: “Please place your vote 0 or 1 in the envelope”
Zero Knowledge Proofs • I can prove that I know X without disclosing what X is. • I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed) • I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)
Bounding the L2-Norm • A natural and effective way to restrict a cheating user’s malicious influence • You must have a big vector to produce large influence on the sum • Perturbation theory bounds system change with norms: |σi(A) - σi(B)| ≤ ||A-B||2 [Weyl] • Can be the basis for other checks • Setting L = 1 forces each user to have only 1 vote
An Efficient ZKP of Boundedness • Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small. • The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small. • O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L. • Running time reduced from hours to seconds.
Random Projection-basedL2-Norm ZKP • Server generates N random m-vectors in {-1, 0, +1}m with i.i.d. probability {¼, ½, ¼} • User projects his data to the N directions. provides ZKP that the square sum of the projections < NL2/2 • Expensive public key operations are only on the projections and the square sum
Acceptance/rejection Probabilities (a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)
Performance Evaluation • Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length. • Standard technique takes 6 to 10 hours at m = 106
Current Status • The protocols (the L2-norm ZKP and the private vector addition) have been implemented • Adding more mid-tier components • In Java using native code for big integer • Runs on Linux platform • Made an open-source toolkit for building privacy-preserving real-world applications
More info • duan@cs.berkeley.edu • http://www.cs.berkeley.edu/~duan/research/p4p.html • Thank You!