1 / 25

Privacy-Preserving Clustering

Privacy-Preserving Clustering. Outline. Introduction Related Work Secure Multi-Party Computation Data Sanitization Preliminaries Yao ’ s Millionaires ’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion. Introduction. Why needs privacy-preserving?

lbotkin
Download Presentation

Privacy-Preserving Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-Preserving Clustering

  2. Outline • Introduction • Related Work • Secure Multi-Party Computation • Data Sanitization • Preliminaries • Yao’s Millionaires’ Problem • Homomorphic Encryption • Privacy-Preserving K-Means Clustering • Conclusion

  3. Introduction • Why needs privacy-preserving? • Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. • The privacy problem is not data mining, but the way data mining is done. • So, privacy and data mining can coexist. • An important data mining problem: clustering.

  4. Related Work • Privacy-preserving clustering: • Secure multi-party computation. • High computation and communication costs. • Data sanitization. • Lost of accuracy. • Dimensionality reduction. • Model-based solutions.

  5. Yao’s Millionaires’ Problem • Millionaires’ problem: • Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth.

  6. Solutions • Suppose • Alice has i millions. • Bob has j millions. • 1 < i, j < 10.

  7. Solutions • Suppose • Alice: i = 5, Bob: j = 3. • (B) x = 7, Ea(x) = 4 = k. • (B) k - j + 1 = 2.

  8. Solutions • (A)

  9. Solutions • (A) 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j.

  10. Homomorphic Encryption • Homomorphic encryption: • If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. • H(x⊕y) = H(x) ⊙ H(y) • RSA, … • Additive homomorphic: • H(x+y) = H(x) * H(y) • Paillier, …

  11. Homomorphic Encryption

  12. Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data SIGKDD, 2003

  13. Problem Definition • Goal: • Cluster the known set of common entities without revealing any value that the clustering is based on. • Input: • Each user provides one attribute of all items. • Output: • Assignment of entities to clusters. • Cluster centers themselves.

  14. K-Means Clustering

  15. K-Means Clustering clusterdecision new centercomputation distance matrix

  16. Vertically Partitioned Data User 1 User 2

  17. Terminology • r: # of users, each having different attributes for the same set of items. • n: # of the common items. • k: # of clusters required. • ui: each cluster mean, i = 1, …, k. • uij: projection of the mean of cluster i on user j. • Final result for user j: • The final value / position of uij, i = 1, …, k. • Cluster assignments: clusti for all points i = 1, …, n.

  18. Privacy-Preserving K-Means Clustering

  19. Securely Finding the Closest Cluster

  20. Securely Finding the Closest Cluster • The security of the algorithm is based on three key ideas. • Disguise the site components of the distance with random values that cancel out when combined. • Permute the order of clusters so the real meaning of the comparison results is unknown. • Compare distances so only the comparison result is learned; no party knows the distances being compared.

  21. Securely Finding the Closest Cluster

  22. Securely Finding the Closest Cluster

  23. Securely Finding the Closest Cluster

  24. j m Check Threshold

  25. Conclusion • Horizontally partitioned data: User 1 User 2

More Related