1 / 1

Overview Study an important privacy preserving method, namely k-anonymity

2D representation of the original and generalized table. Name. Age. Start-year. Salary. Alice. 25. 2001. 7k. Bob. 30. 2004. 1k. Christina. 35. 1990. 2k. Complexity and Approximation Ratio d : dimensionality n : the size of dataset. Daniel. 40. 1995. 3k. Emily.

theo
Download Presentation

Overview Study an important privacy preserving method, namely k-anonymity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2D representation of the original and generalized table. Name Age Start-year Salary Alice 25 2001 7k Bob 30 2004 1k Christina 35 1990 2k Complexity and Approximation Ratio d: dimensionality n: the size of dataset Daniel 40 1995 3k Emily 45 2000 6k William 55 1985 3k Algorithm Time Complexity Approximation Ratio The original payroll table DAG O(3ddnlog2n) 8d MMG O(dn2d+1) 2d+1 Age Start-year Salary NNG O(dn2) 6d [25, 45] [2000, 2004] 7k [25, 45] [2000, 2004] 1k [35, 55] [1985, 1995] 2k [35, 55] [1985, 1995] 3k [25, 45] [2000, 2004] 6k [35, 55] [1985, 1995] 3k A 3-anonymous generalization The Institute for Information AssuranceOn Multidimensional k-Anonymity with Local Recoding GeneralizationPresented by: Yang DuCollege of Computer and Information ScienceNortheastern University, Boston, MA 02115duy@ccs.neu.edu • Overview • Study an important privacy preserving method, namely k-anonymity • Show it is provably hard, even to find a good enough approximate answer • Develop three algorithms with different tradeoffs between the approximation ratio and complexity • Introduction • Motivation is privacy preserving • Publish sensitive data to allow accurate analysis without revealing the privacy • Simply removing the id column is not enough • Attackers can use some other attributions, called quasi-identifiers, to restore the identities • Generalization is necessarily • The quasi-identifiers are replaced by values in more general forms • K-anonymity is often a requirement • Make the quasi-identifiers of each tuple undistinguishable with at least those of other (k-1) tuples • Approximation Algorithms • The Divide-and-Group (DAG) Algorithm • Divide the space into square cells with proper size • Find a set of non-overlapping tiles of 2 x 2 cells to cover the points, such that each tile covers at least k points • Assign the rest of (uncovered) points to the nearest tile • Problem Mapping • Given a table R containing d quasi-identifier attributes • Map each quasi-identifier attribute to one dimension • Map each tuple in the table to a point in d-dimensional space • Map the k-anonymous generalization problem to a partition problem • Partition a set of d-dimensional points into some groups • Each point belongs to one and only one group • Each group contains at least k points • Each point is generalized to the minimum bounding rectangle (MBR) of its group • Quality Measuring • The smaller the MBRs are, the more accurate the analysis results are. • The size of each MBR is measured by its perimeter. • Objective • Find the optimal partition that minimizes the maximum size (perimeter) among all MBRs. • The Min-MBR-Group (MMG) Algorithm • For each point p, find the smallest MBR which covers at least k points including p • Find a set of non-overlapping MBRs from the result of previous step • Assign the points to the nearest MBR • The Nearest-MBR-Group (NNG) Algorithm • For each point p, find the MBR which covers p and its k-1 nearest neighbors • Find a set of non-overlapping MBRs from the result of previous step • Assign the points to the nearest MBR • Hardness of the Problem • Finding the optimal partition is NP-hard (cannot be done within polynomial time). • Finding a partition with approximation ratio less than 5/4, i.e. the maximum perimeter is 5/4 of the maximum perimeter of the optimal partition, is also NP-hard. • For more information: • http://www.ccs.neu.edu/research/dblab • Prof. Donghui Zhang – donghui@ccs.neu.edu • Yang Du – duy@ccs.neu.edu

More Related