1 / 25

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining. Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley. Goal.

holli
Download Presentation

Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Private Computation and Zero-Knowledge Tools for Privacy-PreservingDistributed Data Mining Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley

  2. Goal • To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

  3. Goal • To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale

  4. The Scenario • Two data miners mine data from n users • The data miners are semi-honest: follow the protocol but try to get more info • Some fraction of users can be malicious: they may input bogus data to disrupt the computation • A more realistic adversary model than most existing privacy-preserving data mining schemes

  5. Challenge: standard cryptographic tools not feasible at large scale u1 u2 un-1 un d1 d2 dn-1 dn Must be obfuscated Model f di in Zφm φ: < 32 or 64-bit ……

  6. A Practical Solution • Provable privacy: Cryptography • Efficiency: • VSS over small field. • Minimize the number of expensive primitives and rely on probabilistic guarantee • Realistic adversary model: An extremely efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation

  7. No leakage beyond final result for many algorithms or differential privacy [Dwork06] u1 u2 un-1 un d1 d2 dn-1 dn Cryptographic privacy Basic Approach Σ f = ……

  8. The Power of Addition • A large number of popular algorithms can be run with addition-only steps • Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc • All algorithms in the statistical query model [Kearns 93] • Many other gradient-based numerical algorithms • A trick used a lot for parallelization in distributed computing [Chu 06, Das 07] • Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs

  9. Private Addition • The computation: secret sharing over small field • Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector

  10. Big Integers vs. Small Ones • Most applications work with “regular-sized” integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10-9 sec) • Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10-3 sec) • A 6 orders of magnitude difference!

  11. Private Addition ui vi di: user i’sprivate vector. ui,,vi anddi are all in a small integer field ui + vi = di

  12. Private Addition μ = Σui ν = Σvi ui + vi = di

  13. Private Addition μ ν μ = Σui ν = Σvi ui + vi = di

  14. Private Addition μ + ν

  15. Private Addition • Provable privacy • Computation on each server is over small field: same cost as non-private implementation – O(m) small field operations • So the cost for privacy is only due to verification • For that we have a solution that involves only O(log m) large field operations

  16. Bush 100,000Gore -100,000 The Need for Verification • Private computation obfuscates user data. A malicious user could input anything. • Think of a voting scheme: “Please place your vote 0 or 1 in the envelope”

  17. Zero Knowledge Proofs • I can prove that I know X without disclosing what X is. • I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed) • I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!)

  18. Bounding the L2-Norm • A natural and effective way to restrict a cheating user’s malicious influence • You must have a big vector to produce large influence on the sum • Perturbation theory bounds system change with norms: |σi(A) - σi(B)| ≤ ||A-B||2 [Weyl] • Can be the basis for other checks • Setting L = 1 forces each user to have only 1 vote

  19. An Efficient ZKP of Boundedness • Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small. • The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small. • O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L. • Running time reduced from hours to seconds.

  20. Random Projection-basedL2-Norm ZKP • Server generates N random m-vectors in {-1, 0, +1}m with i.i.d. probability {¼, ½, ¼} • User projects his data to the N directions. provides ZKP that the square sum of the projections < NL2/2 • Expensive public key operations are only on the projections and the square sum

  21. Effectiveness

  22. Acceptance/rejection Probabilities (a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)

  23. Performance Evaluation • Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length. • Standard technique takes 6 to 10 hours at m = 106

  24. Current Status • The protocols (the L2-norm ZKP and the private vector addition) have been implemented • Adding more mid-tier components • In Java using native code for big integer • Runs on Linux platform • Made an open-source toolkit for building privacy-preserving real-world applications

  25. More info • duan@cs.berkeley.edu • http://www.cs.berkeley.edu/~duan/research/p4p.html • Thank You!

More Related